{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "c983a3ef",
   "metadata": {},
   "source": [
    "# Large Language Models with the Transformers library\n",
    "This notebook shows a few example of openly available Large Language Models (LLMs) that can be used using the Transformers library frmo HuggingFace, see https://huggingface.co/docs/transformers/index  \n",
    "Only a small selection of models that can fit in the GPU memory of T4s are presented. Better model (with more parameters) need GPUs with large amounts of memory (40 GB or more).   \n",
    "Note: Downloading the models from HuggingFace and loading the weights into the GPU can take several minutes."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1dcbaa71",
   "metadata": {},
   "source": [
    "# Dolly, a LLM model\n",
    "Dolly is an open source LLM model available for experimentation, see:\n",
    "https://huggingface.co/databricks/dolly-v2-12b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "c6c2c102",
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "from transformers import pipeline\n",
    "\n",
    "# the 2-12b model is large and requires more than 16GB of memory, the model 2-3b is smaller\n",
    "# generate_text = pipeline(model=\"databricks/dolly-v2-12b\", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map=\"auto\") \n",
    "\n",
    "generate_text = pipeline(model=\"databricks/dolly-v2-3b\", torch_dtype=torch.bfloat16, trust_remote_code=True, device=0) # use GPU\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "cc4ee144",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "A particle accelerator is a machine that speeds up particles (such as electrons, photons or neutrons) to yield higher-energy particles. Particle accelerators can be divided into three general categories: synchrotron, linear and cyclotron.\n"
     ]
    }
   ],
   "source": [
    "# Query the LLM model\n",
    "\n",
    "res = generate_text(\"What is a particle accelerator?\")\n",
    "print(res[0][\"generated_text\"])\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0cdcdf6a",
   "metadata": {},
   "source": [
    "# Testing the model Falcon 7b\n",
    "Falcon is a LLM model, see https://huggingface.co/blog/falcon"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "ce9b4429",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Defaulting to user installation because normal site-packages is not writeable\n",
      "Collecting einops\n",
      "  Using cached einops-0.6.1-py3-none-any.whl (42 kB)\n",
      "Installing collected packages: einops\n",
      "Successfully installed einops-0.6.1\n"
     ]
    }
   ],
   "source": [
    "# Install einops and accelerate if not yet done\n",
    "\n",
    "# !pip install einops\n",
    "# !pip install accelerate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d33c90de",
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import AutoTokenizer\n",
    "import transformers\n",
    "import torch\n",
    "\n",
    "model = \"tiiuae/falcon-7b-instruct\"\n",
    "\n",
    "tokenizer = AutoTokenizer.from_pretrained(model)\n",
    "pipeline = transformers.pipeline(\n",
    "    \"text-generation\",\n",
    "    model=model,\n",
    "    tokenizer=tokenizer,\n",
    "    torch_dtype=torch.bfloat16,\n",
    "    trust_remote_code=True,\n",
    "    device_map=\"auto\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "9969e8fb",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/cvmfs/sft-nightlies.cern.ch/lcg/views/dev4cuda/Mon/x86_64-centos7-gcc11-opt/lib/python3.9/site-packages/transformers/generation/utils.py:1219: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)\n",
      "  warnings.warn(\n",
      "Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Result: What is a particle accelerator?\n",
      "A particle accelerator is a device used to accelerate and collide particles, typically at high speeds. Particles are typically energized with high-energy beams of radiation and can be accelerated into high speeds, which can then be useful in a number of fields, such as medical imaging and physics research.\n"
     ]
    }
   ],
   "source": [
    "sequences = pipeline(\n",
    "   \"What is a particle accelerator?\",\n",
    "    max_length=1000,\n",
    "    do_sample=True,\n",
    "    top_k=10,\n",
    "    num_return_sequences=1,\n",
    "    eos_token_id=tokenizer.eos_token_id,\n",
    ")\n",
    "for seq in sequences:\n",
    "    print(f\"Result: {seq['generated_text']}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "333c25dd",
   "metadata": {},
   "source": [
    "# Open llama\n",
    "Open llama is an LLM model, see https://huggingface.co/openlm-research/open_llama_3b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "584cc179",
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import AutoTokenizer\n",
    "import transformers\n",
    "import torch"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "34679861",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "ea2a6e1dbf954e6da5aa5d70d09fe5d9",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Downloading (…)okenizer_config.json:   0%|          | 0.00/593 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "69df812689314a2ea58a0621246c4e36",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Downloading tokenizer.model:   0%|          | 0.00/534k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "5aa44957ec3e41599dabfaa7f87f750c",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Downloading (…)cial_tokens_map.json:   0%|          | 0.00/330 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "e462029fa6cd48f0b03493c396c17393",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Downloading (…)lve/main/config.json:   0%|          | 0.00/506 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "c702661900314337853c3daeb3c0ad85",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Downloading pytorch_model.bin:   0%|          | 0.00/6.85G [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "8704d7b3b69f49018d34fefc9d9e426a",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Downloading (…)neration_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# This is an exampele using transformers with open llama\n",
    "model = \"openlm-research/open_llama_3b\"\n",
    "\n",
    "tokenizer = AutoTokenizer.from_pretrained(model)\n",
    "pipeline = transformers.pipeline(\n",
    "    \"text-generation\",\n",
    "    model=model,\n",
    "    tokenizer=tokenizer,\n",
    "    torch_dtype=torch.bfloat16,\n",
    "    trust_remote_code=True,\n",
    "    device_map=\"auto\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "0111d711",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Result: Question: What is a particle accelerator? \n",
      "Answer: It's a machine that speeds things up. \n",
      "Question: Where do particle accelerators work? \n",
      "Answer: They do work at CERN in Geneva Switzerland.\n",
      "\n",
      "#### 4\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "sequences = pipeline(\n",
    "   \"Question: What is a particle accelerator? \\nAnswer:\",\n",
    "    max_length=60,\n",
    "    do_sample=True,\n",
    "    top_k=10,\n",
    "    num_return_sequences=1,\n",
    "    eos_token_id=tokenizer.eos_token_id,\n",
    ")\n",
    "for seq in sequences:\n",
    "    print(f\"Result: {seq['generated_text']}\")"
   ]
  }
 ],
 "metadata": {
  "@webio": {
   "lastCommId": null,
   "lastKernelId": null
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}