Machine learning is increasingly used to build trading algorithms: today’s machine learning algorithms can deal with incredibly complex problems, including portfolio choice. At the same time, there is now a vast amount of data freely available – alongside the computing power to crunch complex data science problems.

Here at ELEKS, one of our colleagues – Volodymyr Mudryi – wanted to explore the capabilities of machine learning to train an algorithm to trade profitably. Let’s take a look.

**Building an agent that can trade for profit: an overview **

The stock market is a very complex environment, so we decided to experiment on a simpler market instead: the Steam Community Market. Steam is a popular online gaming platform. Gamers can use the Steam Community Market to buy and sell in-game items to fellow gamers.

We proceeded to build what’s called a reinforcement learning agent. Reinforcement learning is a field of machine learning where the agent under training adjusts its behavior by observing the optimal way to behave in an environment in return for maximum rewards.

The basic algorithm consisted of empirical rules, which we optimized with genetic algorithms. We then added a Deep Q-Learning algorithm which enables us to construct a rudimentary decision-making policy.

As a final step, we used an Actor-Critic algorithm to optimize the decision-making policy at every step of the process, rather than right at the end of the learning exercise. After all, a real trader would evaluate their trades after every trade.

**Starting with a basic algorithm **

Our experiment used data from the Steam Community Market. There are several thousand items in this market of which only 48 were selected, in line with three criteria that we set. We watched the price history of items for the day over 430 days. You can read more about the full selection criteria and methodology…

Next, we set up a basic algorithm that accounts for the structure of the Steam Community Market and our observations. The algorithm followed these four steps:

- Calculate the maximum price for the last three days of trading on the market.

- If the current price is less than the maximum multiplied by 0.85 (given that 15% of the price is diverted to Steam market commission), then buy the item and consider it the new maximum price – the purchase price. Save the purchase price.

- If the price for which the item was purchased is lower than the current price multiplied by 0.85, then put the item up for sale.

- If the purchased item has not been sold within seven days, reduce the price by 5% and deduct three days from the number of days without sale.

So, what were the results of this basic algorithm? Well, putting this algorithm to trade led to a profit of 179% across 16 months.

**Adding a genetic algorithm **

Next, we wanted to optimize the results of the basic algorithm above, so we added a genetic algorithm. At every step, we randomly generated 50 agents according to a set of statistical parameters. We then picked the best-performing agents based on the return they achieved in the market, adding a new mutation to each agent. We continued the iterations.

In observing the resulting agents, we found that the returns fluctuated without consistently outperforming the basic algorithm by a large margin. There was clearly a learning problem involved.

For example, in one iteration we found that the agent finished executing with a portfolio of items worth 49,630 monetary units, but with only 454 monetary units in the bank. Essentially, the agent anticipated high profits in the future and therefore never sold any items.

**Training the agent with a reinforcement learning algorithm **

We needed to add a machine learning algorithm to help the agent to maximize profits based on what it learns through interacting with its environment. Deep Q-Learning allowed us to find a relatively simple decision-making policy that’s not tied to a specific action at a specific point in time.

For this stage of the experiment, we only selected one of the items – and compared the results to the basic algorithm applied to just one item.

We won’t discuss the precise methodology of the Deep Q-Learning algorithm here – but the net result was that, after completing the training and running the model, the total portfolio value was 15,273 monetary units – up 21% on the base model, and 4% higher than the optimized basic model.

Next, we tried an Actor-Critic algorithm running with the above data but using a reward function that included a range of features. For example, if the agent reaches the end of the data with a positive balance, the agent is rewarded with the amount of money earned.

We also defined the trading environment, and models for both actor and critic. The training process lasted 5,000 epochs with a learning rate equal to 0.0003.

**Our agent simply didn’t trade – so we made some tweaks**

The result of the above Actor-Critic was disappointing: the agent decided that inaction is the best option – which meant the model had no value. We had to tweak a few elements of the model, but most of the work was done on the rewards function which we tweaked by, amongst other things, deleting the time elapsed parameter and changing the reward for purchase to zero.

Unfortunately, these tweaks couldn’t put that agent back on the track to trade, so it shows that reward function for Actor-Critic algorithm requires more complex structure and extra domain knowledges.

**What did we learn? **

Well, it is clear that algorithms can be used to optimize decision–making in an uncertain market, and to do so achieving profit. Including the stock market, or indeed the Steam Community Market. Rules-based genetic algorithms work but require further input.

A Deep Q-Learning algorithm can work for current data and furthermore, showed the best results. Also, it can always be combined with other approaches – or indeed, be expanded by further parameters. That said, our experiment did not involve a response to the environment because prices were predefined and historical.

An Actor-Critic algorithm took a lot of time and efforts but with provided data it’s hard to achieve a good trading results, but we still made some investigations and assumption what could improve training process. Again, you can read full details of our Actor-Critic algorithm here.

With a bit more time and experimentation, we may get to the point where these algorithms can be used to apply to real trading in the stock market – but without a doubt, there is much more to be done.