Machine Learning Model Training with Hyperparameters
Author: Andrew McQueen
In machine learning, we have the goal of estimating an unknown function. A machine learning model is a set of rules that identify patterns in data. You can use it to predict and draw conclusions.
We want a procedure that accurately estimates this function, but many factors affect our ability to do it. Hyperparameters help estimate this unknown function by setting some constraints on the learning process of the model.
Machine learning models solve problems based on their scope, so we select the appropriate approach for each situation. These will depend on the overall objective.
If we just want to predict, it might be useful to use a very flexible model that is hard to understand. To understand the effects of our variables, it’s better to choose a less flexible approach that allows for interpretation. Understanding hyperparameters and their functions is essential for building effective machine-learning models for a particular job.
Hyperparameters and Parameters
You set hyperparameters before training, and they define the learning process. Hyperparameters guide our chosen algorithm in finding optimal parameters. For example, we can control the depth of trees in a random forest or change the learning rate in a neural network.
Hyperparameters are an important part of training a high-quality model, but they do not explain the model itself. The model learns from data to determine parameters, rather than having them chosen. Parameters–determined through training–represent the model. How we configure the learning process affects our model, which requires checking model performance under different processes.
Machine learning algorithms have various hyperparameters. Here are a few examples used in three different algorithms.
- Random forest
- Number of estimators
- Maximum number of features
- Maximum tree depth
- Support Vector Machine
- C
- Gamma
- Kernel
- Neural Network
- Number of hidden layers
- Neurons per hidden layer
- Learning rate
LLM Hyperparameters
Large language models have a variety of hyperparameters affecting their performance. Some to consider are:
- Temperature
- Top P
- Penalties
Temperature
In an LLM, temperature affects the randomness of the model’s output. Lower temperature = predictable output, higher temperature = more creative responses.
Top P
Top P works in tandem with temperature to handle randomness. You achieve this by selecting the most likely tokens from a set, based on a cumulative probability threshold. Setting this to a higher value, in the range of zero to one, would allow the model to consider more of the possible outcomes.
Penalties
Two other hyperparameters are frequency and presence penalties, which deal with repetition in model output. Both factors decrease the likelihood of using a term or phrase, based on its use in the prompt and output.
Tuning Hyperparameters
To get the best model, we need the right hyperparameters because they impact how we get the model parameters. To tune hyperparameters, we implement a search and set an objective function. You should minimize the objective function, which is a chosen metric, typically a validation error.
This approach is effective because it calculates validation errors based on a held-out test set. Some methods are grid searches, random searches, and Bayesian optimization. Manual tuning is also an option but tends to take much longer than defining the hyperparameter search space.
Adjusting hyperparameters requires finding a balance between the complexity of the model and the size of the data set. This also involves considering the trade-off between cost and performance during a thorough search. Because of trade-offs, this thorough search is not always possible, and the issue becomes about resources, money, and time.
Optimal hyperparameters are unknown without experimentation. Tuning assists in avoiding overfitting on the training set. Hyperparameter search optimizes an objective function, but the usefulness of hyperparameters also relies on the model’s purpose. In a setting for which interpretation is of importance, tuning can restrict models from becoming overly complex.
Hyperparameters in Databricks
Databricks offers Hyperopt as a solution to hyperparameter tuning. This feature uses distributed hyperparameter optimization and allows us to choose which objective loss function to minimize, define the search space, and specify a search algorithm.
Paired with Databricks Experiments, we can easily track runs and see how hyperparameters affect the results. While the objective function determines optimal hyperparameters, we can still compare our models’ metrics to determine which should be used. This offers a convenient way to choose which model fits our needs most appropriately.
Summary
Hyperparameters are a key aspect of machine learning which determines the learning process. Tuning is a necessary step in building effective models. Common optimization methods involve an automated process that iterates over a defined search space and finds hyperparameters that minimize an objective function.