implementing LoRA (Low-Rank Adaptation) using the popular PEFT library for fine-tuning a Hugging Face model. This approach is efficient and works well with large language models.
Step-by-Step Implementation of LoRA Using PEFT
- Install Required Libraries
Make sure you have the required libraries installed:
pip install transformers peft datasets accelerate
2. Import Necessary Libraries
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model, PeftModel
from datasets import load_dataset
3. Load a Pre-trained Model and Tokenizer
For this example, we’ll use a pre-trained GPT-2 model.
# Load GPT-2 model and tokenizer
model_name = “gpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Ensure the tokenizer uses padding
tokenizer.pad_token = tokenizer.eos_token
4. Configure LoRA
Define the LoRA parameters. Here, we configure rank, alpha, and the specific layers to apply LoRA.
# Define LoRA configuration
lora_config = LoraConfig(
. task_type=”CAUSAL_LM”, # Task type (e.g., Causal Language Modeling)
. inference_mode=False, # Set to True for inference-only use
. r=8, # Low-rank dimension
. lora_alpha=32, # Scaling factor
. lora_dropout=0.1, # Dropout for LoRA layers
)
# Apply LoRA to the model
peft_model = get_peft_model(model, lora_config)
5. Prepare Dataset
Load and preprocess a dataset for fine-tuning. Let’s use the Hugging Face “wikitext” dataset.
# Load a sample dataset
dataset = load_dataset(“wikitext”, “wikitext-2-raw-v1", split=”train”)
# Tokenize the dataset
def tokenize_function(examples):
. return tokenizer(examples[“text”], truncation=True, padding=”max_length”, max_length=512)
tokenized_dataset = dataset.map(tokenize_function, batched=True)
6. Fine-Tune the Model
Fine-tune the model using LoRA with fewer trainable parameters.
from transformers import TrainingArguments, Trainer
# Define training arguments
training_args = TrainingArguments(
. output_dir=”./lora-fine-tuned”, # Output directory
. per_device_train_batch_size=8,
. num_train_epochs=3,
. logging_dir=”./logs”,
. logging_steps=10,
. save_steps=500,
. save_total_limit=2,
. evaluation_strategy=”steps”,
. eval_steps=500,
. learning_rate=5e-4, # Smaller learning rate for LoRA
)
# Define a Trainer
trainer = Trainer(
. model=peft_model,
. args=training_args,
. train_dataset=tokenized_dataset,
)
# Fine-tune the model
trainer.train()
7. Save and Use the Fine-Tuned Model
After fine-tuning, save the model for inference.
# Save the model
peft_model.save_pretrained(“./lora-fine-tuned”)
# Load the fine-tuned model for inference
fine_tuned_model = PeftModel.from_pretrained(model, “./lora-fine-tuned”)
# Generate text with the fine-tuned model
input_text = “Once upon a time”
inputs = tokenizer(input_text, return_tensors=”pt”)
outputs = fine_tuned_model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Key Notes
• Trainable Parameters: LoRA reduces trainable parameters significantly by only modifying the low-rank adapters, leaving most of the model frozen.
• Memory Efficiency: This approach is ideal for fine-tuning large models on smaller hardware.
• Dataset: You can replace the dataset with your own text data by preprocessing it in a similar way.