Unleashing the Power of Tabular Data

The revolutionary impact of tab transformers architecture

A.I Hub
6 min readAug 13, 2024
Image owned by appiventiv

In the dynamic landscape of artificial intelligence, where innovation is the norm, the Tab Transformer architecture is a game-changer for tabular data processing. While traditional methods have long dominated this field, Tab Transformers are redefining what’s possible by harnessing the power of transformers to unlock deeper insights, unprecedented accuracy and new levels of flexibility. This architecture is not just an incremental improvement, it’s a bold leap forward, transforming how we approach the most structured and essential forms of data in ways that were once unimaginable. Welcome to the future of tabular data processing, where Tab Transformers are rewriting the rules.

Table of Content

  • Tab transformer architecture
  • FT transformer architecture
  • Feature tokenizer
  • Concatenation of numerical and categorical future
  • Transformer

Tab Transformer Architecture

Image owned by gnani.ai

The fundamental concept anchoring the Tab Transformer is the generation of
contextual embeddings for categorical variables. Let us delve into the details
of this architecture.

  1. Categorical embeddings — Each categorical feature, denoted as xi, is
    transformed into a parametric embedding of dimension d using a

    process known as column embedding.
  2. Transformer encoder — These embeddings of categorical features are
    then passed to a transformer encoder, which treats each categorical
    feature as a token or “word” in a sequence. This enables the model to
    understand and learn complex interactions between different
    categorical features.
  3. Contextual embeddings — Inside the transformer encoder, a self-attention mechanism is used to develop contextual embeddings for the

    categorical variables. The self-attention mechanism helps the model to
    weigh the importance and interaction of each categorical feature with

    every other feature within a given instance (row). It is a pivotal aspect
    as it allows the model to capture complex interdependencies among

    the categorical features.
  4. Concatenation of contextual embeddings and normalized

    numerical variables
     — Once the transformer has created contextual
    embeddings for the categorical variables, these are concatenated with
    the normalized numerical variables. This creates a comprehensive
    feature set where both categorical and numerical variables are taken

    into account but the former has been enriched with contextual

    information captured by the transformer.
  5. Multilayer Perceptron (MLP) — The concatenated data is then passed
    to a MLP for the final prediction. The MLP serves as the final
    classifier or regressor, depending on the specific task.
  6. Pretraining and fine-tuning: Like many successful transformer
    based models, TabTransformer employs a two-step process of

    pre-training and fine-tuning. During pretraining, the model is trained
    on a large dataset with a reconstruction objective, learning to predict
    masked (hidden) columns. Once this pretraining step is complete, the
    model is then fine-tuned on the specific task, optimizing for the target
    objective for example, classification or regression.

By leveraging the strengths of transformer architectures for handling
categorical features in tabular data, Tab Transformer can effectively model
intricate feature relationships, leading to high performance predictions.

In Figure 1.1 for a visual representation of the

TabTransformer’s architecture.

Figure 1.1 - TabTransformer architecture

FT Transformer Architecture

Image owned by ubiai

The main idea is to create the embedding of both numerical and categorical
features and pass to the transformer encoder. This approach ensures a more contextually rich representation of the input data than the Tab Transformer, as
it calculates self-attention across both numerical and categorical features. In
contrast, the Tab Transformer only applies self-attention to categorical

features. Let us now go over each component in detail.

Feature Tokenizer

Image owned by McKinsey

The Feature tokenizer module is a component of the FT-Transformer model
that is responsible for converting input features into embeddings.
As shown in Figure 1.2, the conversion process into embedding happens
differently for numerical and categorical data.

  1. Numerical features — For each numerical feature xj, the transformation
    involves an element-wise multiplication of the feature value xj by a

    learned weight vector Wj and then an addition of a bias term bj. This
    is represented as: Tj = bj + xj * Wj. The multiplication by Wj allows
    the model to scale and adjust the influence of the numerical feature.
    The bias term bj allows the model to have a base representation of the
    feature from which adjustments can be made. For numerical features,
    Wj is a weight vector with dimensionality equal to the desired
    dimensionality d of the feature embeddings. This is represented as
    W(num)j ∈R^d.
  2. Categorical features — For each categorical feature, the transformation
    involves a lookup in an embedding table Wj for the category in feature

    xj. The bias term bj is then added. A one-hot vector eTj is used to
    perform the lookup in the table, which retrieves the embedding for the
    specific category in the feature. This is represented as, Tj = bj + eTj *

    Wj
    . This method effectively gives each category in a feature its unique
    embedding in the d-dimensional space. For categorical features, Wj is

    an embedding lookup table. If Sj represents the number of unique
    categories for the j-th categorical feature, then the embedding lookup
    table Wj for this feature would have dimensions Sj by d. This is

    represented as W(cat)j ∈R^Sj×d.

Therefore, in the resulting embeddings, each feature, whether numerical or
categorical, is represented in the same d-dimensional space, which makes it possible to process them uniformly in the subsequent Transformer stages of
the model.

Concatenation Numerical and Categorical Feature

Image owned by FullSession

The numerical and categorical Feature embedding is concatenated. The
concatenated sequence is represented by T. Then, [CLS] token is added at

the beginning of the sequence. The input to the Transformer will be

T= stack (T, [CLS]).

Transformer

Image owned by LinkedIn

The input sequence is processed through the transformer encoder, which
mirrors the original transformer design as proposed by Vaswani and
colleagues. A classification or regression head, dependent on the task at
hand is affixed to the first token emanating from the final layer of the
transformer encoder. Figure 1.2 depicts the architecture of the FT Transformer.

Figure 1.2 - FT architecture

Conclusion

Finally, we navigate the cutting edge innovations in Tab Transformer and FT Transformer architectures, it’s clear that the future of tabular data processing is being rewritten. The feature tokenizer and the seamless concatenation of numerical and categorical features have revolutionized the way we handle structured data, breaking free from the constraints of traditional methods. With transformers at the core, these architectures merge the strengths of both worlds, delivering unparalleled performance and insight. This convergence not only enhances data representation but also sets a new standard for accuracy and efficiency in machine learning tasks. The fusion of these technologies marks a pivotal moment in AI, where the power of transformers is fully unleashed transforming how we interpret and utilize tabular data in ways that are both groundbreaking and game changing.

--

--

A.I Hub
A.I Hub

Written by A.I Hub

We writes about Data Science | Software Development | Machine Learning | Artificial Intelligence | Ethical Hacking and much more. Unleash your potential with us

No responses yet