Many numerical problems can be posed in terms of minimizing something. For instance, you can take a problem like x=2+3 and turn it around to find the value of x that minimizes abs((2+3) - x) (squaring or using absolute value to make the minimum function result zero rather than negative infinity). Lots of times you really are trying to minimize a cost, or in the case of neural networks and other statistical models you might be trying to minimize error. Many kinds of problems can be posed in this form.
The traditional approach is gradient descent of some sort, where you try moving your function input variables around in such a way that they follow the contours of the output, going in the direction that reduces the function result until you hit bottom. This is time-tested and can be good especially if there is only one solution to your problem. There are tricks such as simulated annealing to help you get past obstacles, but gradient descent will always suffer from the fact that it descends gradients. Sometimes there are lots of solutions, some better than others -- these are called local optima, and if there is a best one it is called the global optimum. You would like to find the best region of your problem space, and then you would like to find the best point within that region. And you would like to do this without getting stuck.
No algorithm other than exhaustive search can guarantee finding the global optimum every time (and exhaustive search might take more time than the universe has existed for), but there are some that are better than others across a wide range of problems. For more than twenty years particle swarm optimization as been the go-to tool to solve problems with lots of variables, lots of optima, and other nasty features.
The particle swarm algorithm was originally based on social behavior such as bird flocking, and so it is generally given in terms of some particles which are metaphorical birds; they have a sort of persistent velocity as they move through the search space, which is modified by their own and the other particles' discoveries. Think of it like birds that see other birds behaving as if they see some bird seed, and so they swing around toward that direction to investigate. They are all leaders and all followers of one another.
Today I want to show you an easy way to implement it which is almost as powerful as the standard form. In analyzing the particle swarm years ago I realized that the particles are basically sampling from a Gaussian probability distribution that is centered between the particle and its best neighbor. The beauty of the algorithm is that the standard deviation of the distribution is scaled to the distance between the particles, so at the start of the program when they are spread all over the search space they are likely to search almost anywhere, but as they hone in on a solution neighbors converge and explore together, and their steps become smaller as they cluster in a small region of the search space. It is a self-scaling algorithm, in other words.
First, neighborhoods. The particles learn by communicating with one another through a pattern of connections defined by the researcher. People have gotten into a terrible habit of using the "global best" neighborhood topology, where all the particles are influenced by the one best-performing member of the population. This tends to get trapped in local optima, and I wish I knew why it is used at all. There are lots of better ways to arrange the population so today let's use a traditional topology called Lbest, where every particle has two neighbors, which are the adjacent particles on either side of it in the population array. Wrap the ends. This does it, near the start of your program, before the loop:
for i in range(0,popsize):
Now the population is one big ring. This topology gives the swarm the ability to work on many solutions simultaneously, with various parts of the ring unaffected by what other parts are doing, and only the best solutions are propagated through the population. I usually use a population of twenty particles, by the way, there's no rule but that's good enough. Play around with it.
A particle is a point moving around a high-dimensional Cartesian space, testing its coordinates as inputs to an objective function.
Every particle in the Bare Bones Particle Swarm has two vectors associated with it. One is its current position x, and the other is its previous best position p. Iteratively, if the current position x is better than p, then update p=x. Also you have to remember the value of p's function evaluation, for comparison; it is common to call this pbest. At the end of the trial you will want to know the very best solution found by any member of the population, too, so use a single variable called gbest (or whatever you like), which is the index of the particle with the best solution in the population. Then, p[gbest] will be coordinate values of the best solution you have found.
Looping through the population of individual particles i, first identify i's best neighbor (you could probably take one or the other at random and do nearly as well):
if pbest[nbr[i,1]] < pbest[g]:
then move x[i] to a new point, a new candidate problem solution, where j indexes each variable in the function you are optimizing (this is a nested loop over i and j):
In this code, i is the particle you're modifying, g is its best neighbor, and random.gauss() is a gaussian random number generator that takes mean and s.d. as arguments. Evaluate your objective function using x as the vector of arguments to it, and update p if that is warranted, that is, if f(x) is better (smaller) than pbest. Do this for a thousand iterations or so or until it stops improving, and voila, your problem is solved. Again: play with it. Maybe you need more iterations, maybe less.
The standard deviation of the search distribution is equal to the distance between the particle and its neighbor, which means that each is only a half standard deviation from the mean. So the particle might sample somewhere between them, but it is also likely to go past the neighbor, or even in the opposite direction, away from it. Since we know the distribution is normal, we know that it will sample between them 38.2 percent of the time, and 30.9 percent of the time it will jump past the neighbor; another 30.9 percent of the time the particle will actually go the opposite direction from the "attracting" neighbor.
Figure: the particle samples from a normal distribution centered on each dimension halfway between the individual i and its neighbor g, with standard deviation equal to the distance between them.
Let me point out that this algorithm is for numerical problems that can be posed as minimization, or maximization if you prefer. It is not especially useful for combinatorial problems, where you want to find a permutation or combination of things. There are algorithms for that, including ant-colony swarm algorithms; particle swarms are used for finding solutions in numerical spaces.
By the way, I will point out that the very first particle swarm implementation was the optimization of neural network weights, and it is really good for that, eliminating many of the problems you may find with backprop and other gradient-following methods. You propose a population of networks with random weights, and let them interact iteratively, minimizing squared error at the outputs. It is used quite a lot for that -- try it, and let me know how it works for you.
Reference: Kennedy, J. (2003). Bare bones particle swarms. Proceedings of Swarm Intelligence Symposium, 2003 (SIS’03), 80–87.