{ "cells": [ { "cell_type": "markdown", "id": "8ed1c655", "metadata": {}, "source": [ "# Transformers for text processing\n", "Transformers provides APIs and tools to easily download and train state-of-the-art pretrained models. \n", "Credits: Huggingface documentation" ] }, { "cell_type": "markdown", "id": "cfa8e403", "metadata": {}, "source": [ "## Text generation\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "e5f8931a", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).\n", "Using a pipeline without specifying a model name and revision in production is not recommended.\n" ] } ], "source": [ "from transformers import pipeline\n", "\n", "# Create the text-generation pipeline with GPU\n", "generator = pipeline(\"text-generation\", device=0) # Specify the GPU device index\n" ] }, { "cell_type": "code", "execution_count": 2, "id": "853c6157", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/cvmfs/sft-nightlies.cern.ch/lcg/views/dev4cuda/Mon/x86_64-centos7-gcc11-opt/lib/python3.9/site-packages/transformers/generation/utils.py:1219: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)\n", " warnings.warn(\n", "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n", "/cvmfs/sft-nightlies.cern.ch/lcg/views/dev4cuda/Mon/x86_64-centos7-gcc11-opt/lib/python3.9/site-packages/transformers/generation/utils.py:1313: UserWarning: Using `max_length`'s default (50) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.\n", " warnings.warn(\n" ] }, { "data": { "text/plain": [ "[{'generated_text': 'In this notebook, we show you how to run and deploy custom Docker builds (Rockskin Docker Compile script). We also show how to add a new project for the current directory in a bash script.\\n\\nIn an example, suppose'}]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Generate text from a prompt\n", "generator(\"In this notebook, we show you how to\")" ] }, { "cell_type": "markdown", "id": "25ff457f", "metadata": {}, "source": [ "## Summarization\n" ] }, { "cell_type": "code", "execution_count": 3, "id": "68ed4127", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).\n", "Using a pipeline without specifying a model name and revision in production is not recommended.\n" ] } ], "source": [ "# Create the summarization pipeline with GPU\n", "summarizer = pipeline(\"summarization\", device=0) # Specify the GPU device index" ] }, { "cell_type": "code", "execution_count": 4, "id": "a855ca3f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'summary_text': ' In quantum field theory, particles can be described as waves in a field . Fundamental ‘quantum fields’ fill the universe and dictate what nature can and cannot do . The Higgs boson first appeared in a scientific paper written by Peter Higgs in 1964 .'}]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# text from the CERN website\n", "summarizer(\n", "\"\"\"\n", "What is the Higgs boson?\n", "Animation to show that a particle can also be thought of as a wave in a field\n", "In quantum field theory, particles can be described as waves in a field (Image: Piotr Traczyk/CERN)\n", "\n", "To answer this question needs an exploration into the quantum world and how particles interact…\n", "The particle that we now call the Higgs boson first appeared in a scientific paper written by Peter Higgs in 1964. At that time, physicists were working on describing the weak force – one of the four fundamental forces of Nature – using a framework called quantum field theory.\n", "\n", "Particle, wave or both?\n", "Quantum field theory describes the microscopic world of particles very differently to everyday life. Fundamental “quantum fields” fill the universe and dictate what nature can and cannot do. In this description, every particle can be represented by a wave in a “field”, similar to a ripple on the surface of a vast ocean. One example is the photon, the particle of light, which is a wave in the electromagnetic field. \n", "\n", "Force carriers\n", "When particles interact with one another, they exchange “force carriers”. These force carriers are particles and can also be described as waves in their respective fields. For example, when two electrons interact, they do so by exchanging photons – photons are the force carriers of the electromagnetic interaction.\n", "\n", "Symmetry\n", "Another important component of this picture is symmetry. Just like a shape can be called symmetrical if it doesn’t change when rotated or flipped, similar requirements are placed on the laws of Nature.\n", "For example, the electrical force between particles with an electrical charge of one will always be the same, irrespective of whether the particle is an electron, muon or proton. Such symmetries form the basis and define the structure of the theory.\n", "\"\"\"\n", ")" ] }, { "cell_type": "markdown", "id": "f1ff66b1", "metadata": {}, "source": [ "## Translation\n" ] }, { "cell_type": "code", "execution_count": null, "id": "827963e5", "metadata": {}, "outputs": [], "source": [ "# sentencepiece is needed for this example\n", "# pip install **and restart the notebook**\n", "\n", "!pip install sentencepiece" ] }, { "cell_type": "code", "execution_count": 1, "id": "c257f7a0", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/cvmfs/sft-nightlies.cern.ch/lcg/views/dev4cuda/Mon/x86_64-centos7-gcc11-opt/lib/python3.9/site-packages/transformers/models/marian/tokenization_marian.py:194: UserWarning: Recommended: pip install sacremoses.\n", " warnings.warn(\"Recommended: pip install sacremoses.\")\n" ] } ], "source": [ "from transformers import pipeline\n", "\n", "# Create the translator pipeline with GPU\n", "translator = pipeline(\"translation\", model=\"Helsinki-NLP/opus-mt-en-fr\", device=0)\n", "#translator = pipeline(\"translation\", model=\"Helsinki-NLP/opus-mt-en-fr\")\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "51449b3d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'translation_text': \"Le complexe d'accélérateur du CERN est une succession de machines qui accélèrent les particules à des énergies de plus en plus élevées. Chaque machine stimule l'énergie d'un faisceau de particules avant de l'injecter dans la machine suivante dans la séquence. Dans le Grand Collisionneur de Hadron (LHC) – le dernier élément de cette chaîne – les faisceaux de particules sont accélérés jusqu'à l'énergie record de 6,5 TeV par faisceau.\"}]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# text from the CERN website\n", "\n", "translator(\n", "\"\"\"\n", "The accelerator complex at CERN is a succession of machines that accelerate particles to increasingly higher energies. \n", "Each machine boosts the energy of a beam of particles before injecting it into the next machine in the sequence. \n", "In the Large Hadron Collider (LHC) – the last element in this chain – particle beams are accelerated up to the record energy of 6.5 TeV per beam.\n", "\"\"\")" ] }, { "cell_type": "code", "execution_count": null, "id": "431f95ad", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "@webio": { "lastCommId": null, "lastKernelId": null }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.9" } }, "nbformat": 4, "nbformat_minor": 5 }