Microsoft Phi 2 Dpo Finetuned
Fine-tuning Microsoft’s Phi-2 Machine Learning Model with DPO
Introduction
In this blog, we’ll focus on fine-tuning a cutting-edge language model from Microsoft, known as Phi-2, with Differential Privacy Optimization (DPO). We’ll do this using an open-source dataset, modern Python libraries and the power of Google’s Colaboratory. Our code examples are in Python and are designed for simplicity and clarity.
TL;DR
- Try the model here: https://huggingface.co/akshay326/akshay326-dpo-finetuned-phi-2
- Finetune the model on your own on Google Colab: https://colab.research.google.com/drive/1nwpBZQQGjYjzWpQdBaf3xhk4S8CDpVM4#scrollTo=YpdkZsMNylvp
Import Necessary Libraries
# -*- coding: utf-8 -*-
"""Fine-tune Phi-2 model with DPO.ipynb
Automatically generated by Colaboratory.
Original file is located at
https://colab.research.google.com/drive/1nwpBZQQGjYjzWpQdBaf3xhk4S8CDpVM4
"""
!pip uninstall -y transformers
!pip install git+https://github.com/huggingface/transformers
!pip install -q datasets trl peft bitsandbytes sentencepiece wandb
import os
import gc
import torch
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, BitsAndBytesConfig
from datasets import load_dataset
from peft import LoraConfig, PeftModel, get_peft_model, prepare_model_for_kbit_training
from trl import DPOTrainer
import bitsandbytes as bnb
from google.colab import userdata
import wandb
Here, we import an array of deep learning and language modeling libraries, such as Transformers, datasets, and TRL (for DPO-based training).
Setup Tokens and Model Names
# Defined in the secrets tab in Google Colab
hf_token = userdata.get('HF_TOKEN')
wb_token = userdata.get('WANDB_API_KEY')
wandb.login(key=wb_token)
model_name = "microsoft/phi-2"
new_model = "akshay326-dpo-finetuned-phi-2"
For security purposes, we retrieve the Hugging Face and Weights & Biases tokens from Google Colab’s secrets tab.
Load and Format the Dataset
def chatml_format(example):
# Format system
if len(example['system']) > 0:
message = {"role": "system", "content": example['system']}
system = tokenizer.apply_chat_template([message], tokenize=False)
else:
system = ""
return {
"prompt": system + prompt,
"chosen": chosen,
"rejected": rejected,
}
# Load dataset
dataset = load_dataset("Intel/orca_dpo_pairs")['train']
# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True,)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"
# Format dataset
dataset = dataset.map(
chatml_format,
remove_columns=original_columns
)
# Print sample
dataset[1]
In this block, we load our training dataset and format it so that it suits our model’s requirements.
Training the Model with DPO
# Training arguments
training_args = TrainingArguments(
per_device_train_batch_size=1,
gradient_accumulation_steps=4,
gradient_checkpointing=True,
learning_rate=5e-5,
lr_scheduler_type="cosine",
max_steps=200,
save_strategy="no",
logging_steps=1,
output_dir=new_model,
optim="paged_adamw_32bit",
warmup_steps=100,
fp16=not HAS_BFLOAT16,
bf16=HAS_BFLOAT16,
report_to="wandb",
)
# Fine-tune model with DPO
dpo_trainer.train()
This code chunk contains the main logic to fine-tune the Phi-2 model using Differential Privacy Optimization (DPO). We train the model using various hyperparameters.
Saving, Uploading, and Inferencing
# Save artifacts
dpo_trainer.model.save_pretrained("final_checkpoint")
tokenizer.save_pretrained("final_checkpoint")
# Flush memory
del dpo_trainer, model, ref_model
gc.collect()
torch.cuda.empty_cache()
# Merge base model with the adapter
model = PeftModel.from_pretrained(base_model, "final_checkpoint")
model = model.merge_and_unload()
# Save model and tokenizer
model.save_pretrained(new_model)
tokenizer.save_pretrained(new_model)
# Push them to the HF Hub
model.push_to_hub(new_model, use_temp_dir=False, token=hf_token)
tokenizer.push_to_hub(new_model, use_temp_dir=False, token=hf_token)
# Generate text
sequences = pipeline(
prompt,
do_sample=True,
temperature=0.7,
top_p=0.9,
num_return_sequences=1,
max_length=200,
)
print(sequences[0]['generated_text'])
Once the fine-tuning is complete, we save and upload the model using the push_to_hub
method. Then we leverage the pipeline
utility for inferencing and print the resulting text.
That’s it! This post covered how to fine-tune Microsoft’s Phi-2 model using Differential Privacy Optimization. Happy coding!