Mastering the Art of Evaluation

Unlocking the true potential of your models

7 min readAug 14, 2024

In the realm of machine learning, where every decision can make or break the success of a model evaluation is the ultimate proving ground. It’s where theory meets reality and where the true performance of an algorithm is unveiled. Far from being a mere final step, evaluation is the critical process that determines whether your model is a groundbreaking success or just another experiment. This is the moment of truth, where data-driven insights collide with rigorous metrics to reveal the real value of your work. Welcome to the art and science of evaluation, where every number tells a story and every metric has the power to transform outcomes.

Table of Content

Evaluation
Analysis
Transformers for regression
The dataset
Pre-process the data
Define model configuration
Train and evaluate

Evaluation

Figure 1.1 presents the benchmark results achieved from various machine
learning models. On the other hand, Table 1.1 displays the outcomes from
our own experiments.

Table 1.1 - Baseline result on various algorithm

Analysis

Figure 1.1 indicates that the best performance in several studies has been
achieved using the XG-Boost model, yielding an accuracy of approximately
87%. In our case, employing the FT Transformer resulted in an accuracy of

around 85%. This discrepancy suggests the potential for further investigation
and experimentation.
It is essential to note that our current model’s evaluation does not include
cross-validation, a technique often used to assess the robustness of a model. In addition, we did not incorporate any feature transformation or feature

engineering methods, which are commonly used to enhance the performance
of a model.
Yet, it is impressive to see that our straightforward application of the FT
Transformer still managed to achieve results that are close to the best
performance recorded. This reinforces the potential of transformer models
and suggests that with some fine-tuning and enhancements, we may even

surpass the benchmark set by XG-Boost.

Transformers for Regression

In this demonstration, we will explore the application of a transformer model
for regression tasks, specifically using the Ames Housing dataset.

The Dataset

The Ames Housing dataset is a comprehensive record of individual
residential property sales that occurred in Ames, Iowa, between 2006 and
2010. With over 80 explanatory variables, this dataset provides a plethora of

information useful for predictive modeling, as it offers a rich array of factors
that contribute to home values.

These factors span a wide spectrum:

General attributes of the property, such as the type of dwelling, its
zoning classification, proximity to amenities and roads and the overall
configuration and layout of the property and lot.
Detailed attributes of the house itself, including the roof type, exterior
materials, masonry work and foundation type.
Comprehensive ratings of the overall quality and condition of various
parts of the house, ranging from the exterior finish to the heating
system.
Detailed information about specific areas within the house, such as the
basement, garage, and porch, as well as the presence of a pool. This
also includes details about the number and quality of rooms,

bedrooms, kitchens and bathrooms.
Specifics about the sale transaction, like the type and condition of the
sale and the month and year the sale took place.

The dataset aims to predict the final sale price of each property, making this
a regression problem when employing machine learning to forecast the sale
price based on all the other variables.

Pre-process the Data

The code presented here performs several data processing tasks for a

machine learning experiment.

Firstly, it downloads the Ames Housing dataset from a specified URL
using the pandas read_csv method, saving it to a dataframe.
It then defines lists of the categorical and numerical columns, as well
as the target column (SalePrice).
The code proceeds to handle missing values in the data. For

categorical columns, it fills in missing values with the most frequent
value (mode) for that column. For numerical columns, including the
target, it fills in missing values with the median value for that column.
After handling missing values, the code uses MinMaxScaler from
Scikit-learn library to scale the numerical columns. This normalization
adjusts all numerical values to fall within the same range typically 0

to 1, which is often beneficial for machine learning algorithms.
Finally, it splits the pre-processed dataframe into a training set 80%
of the data) and a test set the remaining 20% of the data and
displays the first few rows of the resulting dataframe for verification.

# Download the dataset
import pandas as pd
url = "https://raw.githubusercontent.com/wblakecannon/ames/master/data/housing.csv"

ames_df = pd.read_csv(url)

# List of categorical and numerical columns

cat_cols = ['Garage Yr Blt', 'Mo Sold', 'Yr
Sold','Open Porch SF', 'Enclosed Porch', '3Ssn
Porch', 'Screen Porch','Wood Deck
SF','Fireplaces','Year Remod/Add','Year
Built','Overall Cond','Overall Qual','MS SubClass',
'MS Zoning', 'Street', 'Alley', 'Lot Shape', 'Land
Contour', 'Utilities', 'Lot Config', 'Land Slope',
'Neighborhood', 'Condition 1', 'Condition 2', 'Bldg
Type', 'House Style', 'Roof Style', 'Roof Matl',
'Exterior 1st', 'Exterior 2nd', 'Mas Vnr Type',
'Exter Qual', 'Exter Cond', 'Foundation', 'Bsmt
Qual', 'Bsmt Cond', 'Bsmt Exposure', 'BsmtFin Type
1', 'BsmtFin Type 2', 'Heating', 'Heating QC',
'Central Air', 'Electrical', 'Kitchen Qual',
'Functional', 'Fireplace Qu', 'Garage Type',
'Garage Finish', 'Garage Qual', 'Garage Cond',
'Paved Drive', 'Pool QC', 'Fence', 'Misc Feature',
'Sale Type', 'Sale Condition']


num_cols = ['Lot Frontage', 'Lot Area', 'Mas Vnr
Area', 'BsmtFin SF 1', 'BsmtFin SF 2', 'Bsmt Unf
SF', 'Total Bsmt SF', '1st Flr SF', '2nd Flr SF',
'Low Qual Fin SF', 'Gr Liv Area', 'Bsmt Full Bath',
'Bsmt Half Bath', 'Full Bath', 'Half Bath',
'Bedroom AbvGr', 'Kitchen AbvGr', 'TotRms AbvGrd', 
'Garage Cars', 'Garage Area', 'Pool Area', 'Misc
Val']

target_col = ['SalePrice']

## Perform Null Value Imputation

for col in cat_cols:
 ames_df[col].fillna(ames_df[col].mode()[0], inplace=True)

# Replace NaN in continuous columns with the median
for col in num_cols+target_col:
 ames_df[col].fillna(ames_df[col].median(), inplace=True)
ames_df = ames_df.dropna()

# Check the first few rows
print(ames_df.shape)

# Min-max scalar
from sklearn.preprocessing import MinMaxScaler

# Assuming df is your DataFrame and the columns you
want to scale are in the list 'cols_to_scale'
scaler = MinMaxScaler()
cols_to_scale=num_cols+target_col

# Fit the scaler to the columns in 'cols_to_scale'
scaler.fit(ames_df[cols_to_scale])

# Transform the columns
ames_df[cols_to_scale] = scaler.transform(ames_df[cols_to_scale])

# train, test split
train = ames_df.sample(frac=0.8, random_state=0)
test = ames_df.drop(train.index)
print(ames_df.head())

Define Model Configuration

The code for setting up a machine learning experiment with the

FT Transformer model is as follows.

data_config = DataConfig(
 target=target_col, # target column name
 continuous_cols=num_cols, # numerical column names
 categorical_cols=cat_cols, # categorical column names
 continuous_feature_transform="quantile_normal", normalize_continuous_features=True)

trainer_config = TrainerConfig(auto_lr_find=True, batch_size=256, max_epochs=100, early_stopping="valid_loss", early_stopping_mode = "min", early_stopping_patience=5, checkpoints="valid_loss", load_best=True,)
optimizer_config = OptimizerConfig()

# Specify the model configuration
head_config = LinearHeadConfig(
 layers="", # No additional layer in head, just
a mapping layer to output_dim
 dropout=0.1, initialization="kaiming").__dict__ # Convert to dict to pass to the model
config (OmegaConf doesn't accept objects)
model_config = FTTransformerConfig(task="regression",
 learning_rate = 1e-3,
 head = "LinearHead", #Linear Head
 head_config = head_config,) # Linear Head Config

tabular_model = TabularModel(data_config=data_config,
 model_config=model_config,
 optimizer_config=optimizer_config,
 trainer_config=trainer_config,)

Train and Evaluate

Here, we are running our experiment and calculating the r-squared on the
test-dataset.

tabular_model.fit(train=train)
tabular_model.evaluate(test)
prediction=tabular_model.predict(test)

from sklearn.metrics import r2_score
r2 = r2_score(prediction['SalePrice'], prediction['SalePrice_prediction'])
print(f"R2 Score: {r2}")

Output:

R2 Score: 0.735613747041542

Analysis

The R-squared value achieved is decent, but there is still room for

improvement through further optimization. You might consider various
feature engineering strategies to boost the model’s performance. Some of
these strategies could include.

Using more sophisticated methods to impute null values, such as
nearest neighbors.
Conducting feature selection to reduce the dimensionality of your data
and focus on the most informative features.
Applying feature transformations or creating new features to better
capture the underlying patterns in your data.

Conclusion

As we conclude our exploration of the evaluation process, it becomes clear that every step, from analysis to model configuration, plays a pivotal role in achieving excellence in regression tasks. Transformers, when meticulously applied to well preprocessed datasets, demonstrate their unparalleled ability to extract meaningful patterns and deliver precise predictions. The careful definition of model configurations, followed by rigorous training and evaluation, underscores the transformative power of these architectures. This journey through evaluation isn’t just a technical exercise but it’s a testament to the potential of transformers to revolutionize data-driven insights, setting new benchmarks in the accuracy and reliability of machine learning models. The future of predictive analytics is here and it is built on the solid foundation of through evaluation and cutting-edge technology.

Mastering the Art of Evaluation

Unlocking the true potential of your models

Table of Content

Evaluation

Analysis

Transformers for Regression

The Dataset

Pre-process the Data

Define Model Configuration

Train and Evaluate

Output:

Analysis

Conclusion

Written by A.I Hub

No responses yet