Microsoft Phi 2 Dpo Finetuned

2 minute read

Fine-tuning Microsoft’s Phi-2 Machine Learning Model with DPO

Introduction

In this blog, we’ll focus on fine-tuning a cutting-edge language model from Microsoft, known as Phi-2, with Differential Privacy Optimization (DPO). We’ll do this using an open-source dataset, modern Python libraries and the power of Google’s Colaboratory. Our code examples are in Python and are designed for simplicity and clarity.

TL;DR

  • Try the model here: https://huggingface.co/akshay326/akshay326-dpo-finetuned-phi-2
  • Finetune the model on your own on Google Colab: https://colab.research.google.com/drive/1nwpBZQQGjYjzWpQdBaf3xhk4S8CDpVM4#scrollTo=YpdkZsMNylvp

Import Necessary Libraries

# -*- coding: utf-8 -*-
"""Fine-tune Phi-2 model with DPO.ipynb

Automatically generated by Colaboratory.

Original file is located at
    https://colab.research.google.com/drive/1nwpBZQQGjYjzWpQdBaf3xhk4S8CDpVM4
"""

!pip uninstall -y transformers
!pip install git+https://github.com/huggingface/transformers

!pip install -q datasets trl peft bitsandbytes sentencepiece wandb

import os
import gc
import torch

import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, BitsAndBytesConfig
from datasets import load_dataset
from peft import LoraConfig, PeftModel, get_peft_model, prepare_model_for_kbit_training
from trl import DPOTrainer
import bitsandbytes as bnb
from google.colab import userdata
import wandb

Here, we import an array of deep learning and language modeling libraries, such as Transformers, datasets, and TRL (for DPO-based training).

Setup Tokens and Model Names

# Defined in the secrets tab in Google Colab
hf_token = userdata.get('HF_TOKEN')
wb_token = userdata.get('WANDB_API_KEY')
wandb.login(key=wb_token)

model_name = "microsoft/phi-2"
new_model = "akshay326-dpo-finetuned-phi-2"

For security purposes, we retrieve the Hugging Face and Weights & Biases tokens from Google Colab’s secrets tab.

Load and Format the Dataset

def chatml_format(example):
    # Format system
    if len(example['system']) > 0:
        message = {"role": "system", "content": example['system']}
        system = tokenizer.apply_chat_template([message], tokenize=False)
    else:
        system = ""

    return {
        "prompt": system + prompt,
        "chosen": chosen,
        "rejected": rejected,
    }

# Load dataset
dataset = load_dataset("Intel/orca_dpo_pairs")['train']

# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True,)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"

# Format dataset
dataset = dataset.map(
    chatml_format,
    remove_columns=original_columns
)

# Print sample
dataset[1]

In this block, we load our training dataset and format it so that it suits our model’s requirements.

Training the Model with DPO

# Training arguments
training_args = TrainingArguments(
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    gradient_checkpointing=True,
    learning_rate=5e-5,
    lr_scheduler_type="cosine",
    max_steps=200,
    save_strategy="no",
    logging_steps=1,
    output_dir=new_model,
    optim="paged_adamw_32bit",
    warmup_steps=100,
    fp16=not HAS_BFLOAT16,
    bf16=HAS_BFLOAT16,
    report_to="wandb",
)

# Fine-tune model with DPO
dpo_trainer.train()

This code chunk contains the main logic to fine-tune the Phi-2 model using Differential Privacy Optimization (DPO). We train the model using various hyperparameters.

Saving, Uploading, and Inferencing

# Save artifacts
dpo_trainer.model.save_pretrained("final_checkpoint")
tokenizer.save_pretrained("final_checkpoint")

# Flush memory
del dpo_trainer, model, ref_model
gc.collect()
torch.cuda.empty_cache()

# Merge base model with the adapter
model = PeftModel.from_pretrained(base_model, "final_checkpoint")
model = model.merge_and_unload()

# Save model and tokenizer
model.save_pretrained(new_model)
tokenizer.save_pretrained(new_model)

# Push them to the HF Hub
model.push_to_hub(new_model, use_temp_dir=False, token=hf_token)
tokenizer.push_to_hub(new_model, use_temp_dir=False, token=hf_token)

# Generate text
sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    num_return_sequences=1,
    max_length=200,
)
print(sequences[0]['generated_text'])

Once the fine-tuning is complete, we save and upload the model using the push_to_hub method. Then we leverage the pipeline utility for inferencing and print the resulting text.

That’s it! This post covered how to fine-tune Microsoft’s Phi-2 model using Differential Privacy Optimization. Happy coding!

Updated: