Transforming AI with Hugging Face Models

A deep dive into innovation and excellence

4 min readAug 6, 2024

Imagine harnessing the power of some of the most advanced AI models at your fingertips, ready to tackle complex language tasks with remarkable precision. Hugging Face Models delivers this promise, offering a rich repository of state-of-the-art pre-trained models that have transformed the landscape of natural language processing. From generating human-like text to performing intricate language translations. Hugging Face Models provides the tools that drive innovation and enable ground breaking applications in AI. Dive into a world where cutting-edge technology meets ease of use, revolutionizing how we interact with and leverage artificial intelligence.

Table of Content

Model
Environment Setup
Training

Model

There are over 150K models on Hugging Face for many machine learning
tasks. In the earlier example, we discussed the sentiment analysis on IMDB

dataset using the pre-trained transformer model.
In this section, we will focus on the fine-tuning of the stable diffusion based
implementation of Dreambooth. Dreambooth is a text-to-image diffusion
model that generates images based on the text input provided. Typically, a

text-to-image model can generate an image based on a prompt like man
climbing Mount Everest. However, if you want the model to generate an
image of you climbing Mount Everest, Dreambooth can be used to fine-tune the model with a few images of yourself. Once the model is fine-tuned, it

can generate any image that includes you doing something. This makes
Dreambooth an effective tool for subject driven image generation.

Environment Setup

Please activate your Conda environment and enter this command
in the terminal. Follow the prompts and input the necessary parameters
accordingly.

accelerate config

The accelerate config command is typically used to set up the default

parameters for the Accelerate library, such as the precision mode for
example mixed precision or full precision, the gradient accumulation
settings and other related parameters. This command must be run before any

other code is executed in the PyTorch script to ensure the Accelerate library
is configured correctly.

Training

Make sure to download train_dreambooth.py from the GitHub and place it
on your computer.

https://github.com/huggingface/diffusers/tree/main/examples/dreambooth. Please run this command on your Terminal. This will start
training the Dreambooth model.

export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export
INSTANCE_DIR="provide your file path here"
export
OUTPUT_DIR="provide your file path here"

accelerate launch train_dreambooth.py \

–pretrained_model_name_or_path=$MODEL_NAME \
 –instance_data_dir=$INSTANCE_DIR \

 –output_dir=$OUTPUT_DIR \

 –instance_prompt="a photo of sks boy" \

 –resolution=512 \

 –train_batch_size=1 \

 –gradient_accumulation_steps=1 \

 –learning_rate=5e-6 \

 –lr_scheduler="constant" \

 –lr_warmup_steps=0 \

 –max_train_steps=1000

To provide a better understanding of the parameter mentioned above, let us
explore its meaning in detail. To fine-tune the Dreambooth model, we need

to export three variables in the command line interface,
The first variable is MODEL_NAME, which specifies the name of the base model.
In this case, we are using the stable diffusion model.

The second variable is INSTANCE_DIR, which specifies the photo’s location for
fine tuning the model. We recommend using 5-10 images in PNG format. I

used these three photos in my fine-tuning. You could use any subject,
including cat, flower, yourself and so on. However, a photo with a clear face
and transparent background seems to work better.

The third variable is OUTPUT_DIR, which specifies the directory where
the fine tuned model will be saved. Please ensure that this directory is
empty before running the code.
The instance_prompt parameter is a crucial identifier that is required
for inference. In the provided code, the instance_prompt is set as a

photo of sks boy. Please ensure that you provide an appropriate

identifier for your training process, as this will be necessary for
accurate inference.

The total training time for the model may vary based on the specifications of
your computer and the configuration you have set for acceleration. It
typically takes 30 minutes to 1 hour to complete the entire training process.
As an example, I conducted the training on a Mac with an M2 Max

processor and 32 GB of RAM and it took me 45 minutes.

Conclusion

Finally, we are harnessing the full potential of AI models hinges on a seamless integration of model selection, environment setup and rigorous training. Choosing the right model is the first crucial step as it dictates the capabilities and performance of your AI solution. Setting up the environment efficiently ensures that all necessary tools and dependencies are in place, facilitating smooth development and execution. The training process then refines the model, allowing it to learn from data and improve its accuracy and functionality. Together, these elements create a powerful synergy that drives the success of AI projects, transforming innovative ideas into tangible high performing applications. Hugging Face Models epitomize this process, offering pre-trained models and a streamlined setup that accelerate development and push the boundaries of what’s possible in artificial intelligence.