Machine Learning (ML) development is an iterative process in which the accuracy of predictions made by the models is continuously improved by repeating the training and evaluation phases. In each of these iterations, certain parameters are tweaked continuously by developers. Any parameter manually selected based on learning from previous experiments qualify to be called a model hyper-parameter. These parameters represent intuitive decisions whose value cannot be estimated from data or from ML theory. The hyper-parameters are knobs that you tweak during each iteration of training a model to improve the accuracy in the predictions made by the model. The hyper-parameters are variables that govern the training process itself. They are often specified by practitioners experienced in machine learning development. They are often tuned independently for a given predictive modeling problem.
Building an ML model is a long process that requires domain knowledge, experience and intuition. In ML, hyper-parameter optimization or tuning is the problem of choosing a set of optimal hyper-parameters for a learning algorithm. We may not know the best combination of values for hyper-parameters in advance for a given problem. We may use rules of thumb, copy values used on other problems, or search for the best value by trial and error. When a machine learning algorithm is tuned for specific problems by changing the higher level APIs for optimization, we need to tune the hyper-parameters also to discover the parameters that results in a model with higher accuracy in prediction. Hyper-parameter tuning is often referred to as searching the parameter space for optimum values. With Deep Learning models, the search space is usually very large, and a single model might take days to train. The common Hyper-parameters are:
Model optimization using hyper-parameter tuning is a search problem to identify the ideal combination of these parameters. The commonly used methods for optimization using hyper-parameters are; Grid search, Random search and Beyesian optimization. In Grid search, a list of all possible values for each hyper-parameter in a specified range is constructed and all possible combinations of these values are tried sequentially. In grid search, the number of experiments to be carried out increases drastically with the number of hyper-parameters. Rather than training on all possible configurations, in Random search method the network is trained only on a subset of the configurations. Choice of the configurations to be trained is randomly picked up and only the best configuration is trained in each iteration. In Beyesian optimization, we are using ML techniques to figure out the hyper-parameters. It predicts regions of the hyper-parameter space that might give better results. Gaussian process is the technique used and it finds out the optimal hyper parameters from the results of the previously conducted experiments with various types of parameter configurations.