Faizan's LLM: Building a Large Language Model from scratch:

 

MAIN TOOL

Python

Technique

Multihead Attention

INDUSTRY

Generative AI

πŸ“š About the Project

πŸš€ Faizan’s LLM: Building a Large Language Model from scratch:

Β 

Then pretraining and fine-Tuning the Large Language Model for Classification and Instruction.

🌟 Overview

Β 

This repository contains code and documentation for the Large Language Model I built from scratch. I then fine-tunied the LLM after pretraining the Transformer-based Large Language Models (LLMs). It covers essential topics such as:

  • πŸ† Fine-tuning for classification and instruction-following tasks
  • πŸ“š Pretraining a Transformer model from scratch
  • 🎯 Low-Rank Adaptation (LoRA) for efficient fine-tuning
  • πŸ“‰ Cosine decay learning rate scheduling
  • πŸš€ Gradient clipping for stable training
  • ⚑ Detailed implementation of Transformer architectures

πŸ”₯ Features

Β 

  • 🧠 Fine-Tuning Classification Models: Train a Transformer-based model on a classification dataset.
  • πŸ“ Fine-Tuning for Instruction Following: Optimize models to follow instructions using reinforcement learning.
  • πŸ› οΈ Pretraining from Scratch: Build and train a Transformer model with custom tokenization.
  • πŸ“Š LoRA Integration: Implement LoRA for parameter-efficient fine-tuning.
  • πŸ“ˆ Cosine Decay Scheduler: Adjust learning rate dynamically for smooth convergence.
  • πŸ›‘οΈ Gradient Clipping: Prevent exploding gradients during training.
  • βš™οΈ Transformer Architecture: Custom implementation of multi-head attention, layer normalization, and feed-forward networks.

πŸ“‚ Folder Structure – The files must be studied in the following order

Β 


β”œβ”€β”€ dataprocessing.ipynb            # Processing the data for the LLM
β”œβ”€β”€ transformer.ipynb               # Transformer architecture implementation
β”œβ”€β”€ LLMcore.py                      # Core classes and functions for the LLM
β”œβ”€β”€ gpt_download.py                 # Dowload the gpt pretrained model parameters
β”œβ”€β”€ pretraining.ipynb               # Pretraining a Transformer from scratch
β”œβ”€β”€ weightloading.ipynb             # Load the weights fom the pretrained model
β”œβ”€β”€ finetuningclassification.ipynb  # Fine-tuning on classification tasks
β”œβ”€β”€ finetuninginstruction.ipynb     # Fine-tuning for instruction following
β”œβ”€β”€ README.md                        # Documentation
Β 

πŸ›  Installation

Β 

Ensure you have the required dependencies installed before running the notebooks.

pip install torch transformers datasets
Β 

πŸ”Ή Fine-Tuning Classification ModelThe classification fine-tuning was performed using GPT-2 124M parameters, with pretrained weights loaded before fine-tuning. Run the finetuningclassification.ipynb notebook to train a Transformer-based classifier. πŸ”Ή Fine-Tuning for Instruction FollowingThe instruction fine-tuning was performed on GPT-2 300M medium parameters, using pretrained weights loaded before fine-tuning. Use the finetuninginstruction.ipynb notebook to fine-tune an LLM for instruction-following tasks.

πŸ“Œ Usage

Β 

πŸ”Ή Fine-Tuning Classification Model

Β 

Run theΒ finetuningclassification.ipynbΒ notebook to train a Transformer-based classifier.

πŸ”Ή Fine-Tuning for Instruction Following

Β 

Use theΒ finetuninginstruction.ipynbΒ notebook to fine-tune an LLM for instruction-following tasks.

πŸ”Ή Pretraining from Scratch

Β 

To pretrain a Transformer model from scratch, execute theΒ pretraining.ipynbΒ notebook.

πŸ”Ή Transformer Architecture

Β 

TheΒ transformer.ipynbΒ notebook provides an in-depth implementation of Transformer blocks, including:

  • πŸ“Œ Token and positional embeddings
  • πŸ“Œ Multi-head self-attention
  • πŸ“Œ Layer normalization
  • πŸ“Œ Feed-forward networks
  • πŸ“Œ Residual connections

πŸ“Š Training Details

Β 

🎯 Model Performance and Evaluation

Β 

  • βœ… Accuracy Scores for the classification fine tuned LLM:

    • Training Accuracy:Β 94.81%

    • Validation Accuracy:Β 96.53%

    • Test Accuracy:Β 92.57%

    • βœ… Accuracy Scores for the instruction fine tuned LLM:

    • Accuracy Score: *45.84Β as adjudicated by ‘gpt 3.5 turbo’ LLM model.

    • Room for improvement via modulation of the hyperparameters- learning rate, batch size, cosine decay and LoRA and model size.

    Training Accuracy

  • πŸ“‰ Pretraining Loss Curve:

    Pretraining Loss

  • πŸ”₯ Temperature Scaling in Pretraining:

    Temperature Scaling

  • πŸ“Š Loss Curves for Classification Fine-Tuning:

    Loss Curves

  • πŸ” Classification Fine-Tuning Performance:

    Classification Fine-Tuning

  • πŸ“ Instruction Fine-Tuning Results:

    Instruction Fine-Tuning

πŸ“‰ Cosine Decay Learning Rate

Β 

The learning rate is adjusted using cosine decay for stable convergence.

πŸš€ Gradient Clipping

Β 

To prevent instability, gradients are clipped during backpropagation.

πŸ† Low-Rank Adaptation (LoRA)

Β 

LoRA is implemented to enable efficient fine-tuning with minimal computational cost.

πŸ“š References

Β 

  • πŸ“– Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention is all you need.Β Advances in Neural Information Processing Systems, 30.
  • πŸ“– Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training.
  • πŸ“– Hu, E. J., Wang, Y., Singh, A., Wang, Z., Yu, K., & Ainsworth, S. (2021). LoRA: Low-Rank Adaptation of Large Language Models.
  • πŸ“– Sebastian Raschka.Β Building an LLM from Scratch.
  • πŸ“– Jay Alammar and Maarten Grootendorst. Hands-On Large Language Models: Language Understanding and Generation.
  • πŸ“– Andrej Karpathy.Β Building a Chat-GPT. Youtube
  • πŸ“– Krish Naik.Β Machine Learning and Deep Learning Tutorials. Youtube and Udemy.

πŸ’‘ Contributing

Β 

πŸŽ‰ Contributions are welcome! Please feel free to submit issues or pull requests.

πŸ“œ License

Β 

This project is licensed under theΒ MIT License.