Large Language Models with the Transformers library¶
This notebook shows a few example of openly available Large Language Models (LLMs) that can be used using the Transformers library frmo HuggingFace, see https://huggingface.co/docs/transformers/index
Only a small selection of models that can fit in the GPU memory of T4s are presented. Better model (with more parameters) need GPUs with large amounts of memory (40 GB or more).
Note: Downloading the models from HuggingFace and loading the weights into the GPU can take several minutes.
Dolly, a LLM model¶
Dolly is an open source LLM model available for experimentation, see: https://huggingface.co/databricks/dolly-v2-12b
In [1]:
import torch
from transformers import pipeline
# the 2-12b model is large and requires more than 16GB of memory, the model 2-3b is smaller
# generate_text = pipeline(model="databricks/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
generate_text = pipeline(model="databricks/dolly-v2-3b", torch_dtype=torch.bfloat16, trust_remote_code=True, device=0) # use GPU
In [2]:
# Query the LLM model
res = generate_text("What is a particle accelerator?")
print(res[0]["generated_text"])
Testing the model Falcon 7b¶
Falcon is a LLM model, see https://huggingface.co/blog/falcon
In [2]:
# Install einops and accelerate if not yet done
# !pip install einops
# !pip install accelerate
In [ ]:
from transformers import AutoTokenizer
import transformers
import torch
model = "tiiuae/falcon-7b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto"
)
In [2]:
sequences = pipeline(
"What is a particle accelerator?",
max_length=1000,
do_sample=True,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")
Open llama¶
Open llama is an LLM model, see https://huggingface.co/openlm-research/open_llama_3b
In [1]:
from transformers import AutoTokenizer
import transformers
import torch
In [2]:
# This is an exampele using transformers with open llama
model = "openlm-research/open_llama_3b"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto"
)
In [7]:
sequences = pipeline(
"Question: What is a particle accelerator? \nAnswer:",
max_length=60,
do_sample=True,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")