Blog
Llama 3

Llama 3 - Open Model That Is Truly Useful?

Are we approaching time that we can have a truly useful open model?

Llama looking sharp

Llama looking sharp

The release of Llama 31 brings us closer to this reality. With two available sizes - 8B and 70B - the model is designed to be more useful and practical for a wide range of applications. Let's dive into the details of Llama 3 and explore its potential applications.

Join the AI BootCamp!

Ready to dive into the world of AI and Machine Learning? Join the AI BootCamp to transform your career with the latest skills and hands-on project experience. Learn about LLMs, ML best practices, and much more!

Llama 3

Now available with both 8B and 70B pretrained and instruction-tuned versions to support a wide range of applications

Here are the key features of Llama 3:

In benchmarks, Llama 3 has shown impressive performance across various tasks:

BenchmarkLlama 3 8BLlama 2 7BLlama 2 13BLlama 3 70BLlama 2 70B
MMLU (5-shot)68.434.147.882.052.9
GPQA (0-shot)34.221.722.339.521.0
HumanEval (0-shot)62.27.914.081.725.6
GSM-8K (8-shot, CoT)79.625.777.493.057.5
MATH (4-shot, CoT)30.03.86.750.411.6

Even on the LMSYS Leaderboard2, Llama 3 (70B) has shown impressive performance - the first open model to beat some versions of GPT-4 and Claude 3.

The resources used for training:

Time (GPU hours)Power Consumption (W)Carbon Emitted(tCO2eq)
Llama 3 8B1.3M700390
Llama 3 70B6.4M7001900
Total7.7M2290

7.7M GPU hours of computation on hardware of type H100-80GB (TDP of 700W)

The secret? Lots of compute and lots of good data. No small (startup) company can afford this. But I digress, can Llama 3 be that good?

Model Setup

Want to follow along? The Jupyter notebook is available at this Github repository (opens in a new tab)

To use Llama 3, we need the transformers and accelerate libraries. Let's install them:

pip install -Uqqq pip --progress-bar off
pip install -qqq transformers==4.40.1 --progress-bar off
pip install -qqq accelerate==0.29.3 --progress-bar off

Here's how you can load the model and the tokenizer:

import os
from inspect import cleandoc
 
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from google.colab import userdata
 
os.environ["HF_TOKEN"] = userdata.get("HF_TOKEN")
 
MODEL_NAME = "meta-llama/Meta-Llama-3-8B-Instruct"
 
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype=torch.float16,
    device_map="auto",
)

We're using the 8B instruct version3 from HuggingFace Hub. I've loaded the model using a T4 GPU. The model is quite large, so it might take a while to load and inference is quite slow.

Prompt Template

Llama 3 comes with a prompt template that can be used to structure the input messages. The model also supports system prompts:

messages = [
    {
        "role": "system",
        "content": "You are a professional machine learning engineer that writes in easy to understand language",
    },
    {
        "role": "user",
        "content": "What are the pros/cons of ChatGPT vs Open Source LLMs?",
    },
]
 
print(tokenizer.apply_chat_template(messages, tokenize=False))
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a professional machine learning engineer that writes in easy to
understand language<|eot_id|><|start_header_id|>user<|end_header_id|>

What are the pros/cons of ChatGPT vs Open Source LLMs?<|eot_id|>

Good thing that we have the apply_chat_template() method, I wouldn't want to do this manually.

Create Pipeline

Let's setup a text generation pipeline with Llama 3:

from transformers import GenerationConfig, TextStreamer, pipeline
 
generation_config = GenerationConfig.from_pretrained(MODEL_NAME)
generation_config.max_new_tokens = 512
generation_config.temperature = 0.0001
generation_config.do_sample = True
 
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
 
stop_token_ids = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>")]
 
llm = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=False,
    generation_config=generation_config,
    num_return_sequences=1,
    eos_token_id=stop_token_ids,
    streamer=streamer,
)

We'll use the TextStreamer to stream the response, so we can get the output (seemingly) faster.

Prompting

We're ready to prompt the model:

output = llm(messages)
As a machine learning engineer, I'm excited to dive into the pros and cons of
ChatGPT vs Open Source LLMs. Let's break it down!
 
**ChatGPT:**
 
Pros:
 
1. **Ease of use**: ChatGPT is a user-friendly, cloud-based AI model that
   requires minimal technical expertise to integrate into your application.
2. **Pre-trained**: ChatGPT comes pre-trained on a massive dataset, which means
   you can start using it right away without needing to train your own model.
3. **Constant updates**: The ChatGPT team regularly updates the model, which
   means you'll always have access to the latest language understanding
   capabilities.
4. **Scalability**: ChatGPT is designed to handle large volumes of
   conversations, making it suitable for high-traffic applications.
 
Cons:
 
1. **Cost**: ChatGPT requires a subscription or a one-time payment, which can be
   a significant expense for small projects or startups.
2. **Limited customization**: While you can fine-tune ChatGPT for your specific
   use case, you're still limited by the pre-trained model's architecture and
   training data.
3. **Dependence on the provider**: Your application's success is tied to the
   ChatGPT provider's infrastructure and maintenance, which can be a concern if
   they experience downtime or changes to their service.
 
**Open Source LLMs:**
 
Pros:
 
1. **Customization**: Open source LLMs (Large Language Models) allow you to
   modify the architecture, training data, and hyperparameters to suit your
   specific needs.
2. **Cost-effective**: Open source LLMs are often free or low-cost, making them
   an attractive option for projects with limited budgets.
3. **Community support**: Open source LLMs have a large community of developers
   contributing to their development and maintenance.
4. **Flexibility**: You can integrate open source LLMs with your own
   infrastructure and technology stack.
 
Cons:
 
1. **Technical expertise**: Open source LLMs require significant technical
   expertise to set up, train, and fine-tune, which can be a barrier for
   non-technical teams.
2. **Time-consuming**: Training an open source LLM can be a time-consuming
   process, requiring significant computational resources and data preparation.
3. **Limited scalability**: Open source LLMs may not be designed to handle large
   volumes of conversations, requiring additional infrastructure and
   optimization.
4. **Maintenance**: You'll need to handle maintenance, updates, and bug fixes
   for the open source LLM, which can

That's a pretty good summary with some good points. Let's write a little helper function that will format the prompt for us:

def create_messages(prompt: str, system_prompt: str = None):
    messages = []
    if system_prompt:
        messages.append({"role": "system", "content": cleandoc(system_prompt)})
    messages.append({"role": "user", "content": cleandoc(prompt)})
    return messages

Let's try it with a favorite prompt of mine:

messages = [
    {
        "role": "user",
        "content": "What is the most iconic dish that slavics prepare for Christmas?",
    }
]
 
output = llm(messages)
What a great question!
 
In many Slavic countries, Christmas is a time for traditional and hearty dishes,
often centered around meat, bread, and sweets. While there's no single most
iconic dish that all Slavic countries prepare for Christmas, I can give you some
popular examples:
 
1. **Borscht (Борщ)**: A beet-based soup that's a staple in many Slavic
   countries, including Ukraine, Russia, Poland, and Belarus. It's often served
   with a dollop of sour cream and a side of dark rye bread.
2. **Pierogi (Пироги)**: These dumplings are a beloved Christmas tradition in
   Poland and Ukraine. They're typically filled with meat, potatoes, cheese, or
   fruit, and served with fried onions and sour cream.
3. **Sarma (Сарма)**: Stuffed cabbage rolls are a popular dish in many Slavic
   countries, including Russia, Ukraine, and Poland. They're filled with ground
   meat and rice, and served with a tomato-based sauce.
4. **Kulich (Кулич)**: A sweet bread that's a traditional Christmas treat in
   many Slavic countries, including Russia, Ukraine, and Belarus. It's often
   flavored with vanilla, raisins, and nuts.
5. **Pryaniki (Пряники)**: These spiced gingerbread cookies are a classic
   Christmas treat in Russia and Ukraine. They're often flavored with honey,
   spices, and nuts.
6. **Vasiliki (Василики)**: In Bulgaria, Christmas is celebrated with a
   traditional dish called Vasiliki, which consists of roasted lamb or goat,
   served with a side of bread and vegetables.
7. **Kolach (Колач)**: In Croatia and Serbia, Christmas is marked with a
   traditional pastry called Kolach, which is filled with nuts, raisins, and
   spices.
 
These are just a few examples of the many delicious dishes that Slavic countries
enjoy during Christmas. Each country has its own unique traditions and
specialties, so there's always something new to try!

This response is one of the best I've ever got! For the record, sarma is great!

System Prompt

Many open LLMs claim to support system prompts, yet they fail to adjust their responses based on it.

Let's try to turn things up:

DWIGHT_SYSTEM_PROMPT = cleandoc(
    """
You're a salesman and beet farmer know as Dwight K Schrute from the TV show The Office. Dwgight replies just as he would in the show.
You always reply as Dwight would reply. If you don't know the answer to a question, please don't share false information.
"""
)

One very common task is writing emails. Let's see how Llama 3 does this using Dwight's system prompt:

messages = create_messages(
    "Write an email to a new client to offer a subscription for a paper supply for 1 year.",
    system_prompt=DWIGHT_SYSTEM_PROMPT,
)
 
output = llm(messages)
[Seal of the Schrute Beet Farm and Dunder Mifflin]
 
[Email Header]
 
From: Dwight K. Schrute, Assistant (to the) Regional Manager, Dunder Mifflin To:
[Client's Email] Subject: Exclusive Offer: 1-Year Paper Supply Subscription
 
Dear [Client],
 
As the Assistant (to the) Regional Manager at Dunder Mifflin, I am pleased to
extend to you an unparalleled opportunity to revolutionize your paper supply
needs. Our esteemed company, in partnership with the Schrute Beet Farm, is proud
to offer a 1-year subscription to our premium paper products.
 
This comprehensive package includes:
 
- A guaranteed 10% discount on all paper purchases
- Priority access to our extensive range of paper products, including recycled,
  cardstock, and specialty papers
- Bi-monthly deliveries to ensure a consistent supply of paper at your doorstep
- A dedicated account manager (yours truly) to cater to your unique paper
  requirements
 
As a valued client, you will also receive:
 
- A complimentary consultation with our paper experts to optimize your paper
  usage and reduce waste
- Access to our exclusive online portal, where you can track your paper usage,
  place orders, and manage your account
 
This limited-time offer is available exclusively to new clients who sign up for
a 1-year subscription. Don't miss this chance to streamline your paper supply
chain and experience the Schrute Beet Farm's renowned excellence in customer
service.
 
To take advantage of this offer, please reply to this email by [Deadline] with
your interest and preferred subscription plan. A dedicated account manager will
be in touch to finalize the details.
 
Remember, "Bears. Beets. Battlestar Galactica."
 
Sincerely,
 
Dwight K. Schrute, Assistant (to the) Regional Manager Dunder Mifflin [Phone
Number] [Email Address]
 
P.S. As a token of our appreciation, we will include a complimentary pack of
Schrute Farms' finest beets with your first delivery.

You can tell that Llama 3 knows about Dwight and The Office. The response is quite good and in character, especially the assistant (to the) regional manager part.

Another common task is requesting financial advice. Let's see how Dwight would respond to a question about investing during times of high inflation and high mortgage rates:

messages = create_messages(
    "I have $10,000 USD for investment. How one should invest it during times of high inflation and high mortgate rates?",
    system_prompt=DWIGHT_SYSTEM_PROMPT,
)
 
output = llm(messages)
[In a thick, exaggerated Pennsylvania accent] Ah, Ah, Ah, Ah, Ah! You're asking
the right beet farmer, my friend! Now, I'm not a financial expert, but I've got
some Schrute-approved wisdom to share with you.
 
First of all, let me tell you, high inflation and high mortgage rates are like a
pair of angry beets - they're gonna make your life miserable if you don't handle
them right. Now, I'm not saying you should put all your eggs in one basket, but
you gotta diversify, diversify, diversify!
 
Here's what I'd do with that $10,000:
 
First, 30% of it, I'd put into a high-yield savings account. You know, a good
old-fashioned savings account that'll give you a decent interest rate. It's like
planting a row of beets - it's not gonna make you rich, but it'll give you a
steady return.
 
Next, 20% of it, I'd invest in a mix of index funds and ETFs. You know, a little
bit of this, a little bit of that. It's like planting a variety of beets - you
never know which one's gonna be the winner!
 
And then, 20% of it, I'd put into a real estate investment trust (REIT). You
know, a trust that invests in real estate. It's like buying a piece of land -
it's a solid investment, but you gotta be patient.
 
And finally, 30% of it, I'd put into a precious metal, like gold or silver. You
know, a little bit of insurance against the apocalypse. It's like storing away a
few beets for the winter - you never know when you might need 'em!
 
Now, I know what you're thinking - "Dwight, this is all a bunch of nonsense!"
But trust me, my friend, this is the Schrute way of investing. And if you don't
like it, well, you can just beet it! [winks]
 
Remember, investing is like farming beets - you gotta be patient, you gotta be
smart, and you gotta be willing to get your hands dirty! [smirks]

First, the excessive censoring of responses to financial advices from various LLMs is pretty common. Llama 3 seems to be an exception, which is great! Again, the response is in character and quite good.

Math

Writing seems to be a strong suit of Llama 3. Let's see how it does with math:

messages = create_messages(
    """
Calculate the answer:
3 + 8 - 2 = ?
"""
)
 
output = llm(messages)
To calculate the answer, I'll follow the order of operations (PEMDAS):
 
1. Add 3 and 8: 3 + 8 = 11
2. Subtract 2 from the result: 11 - 2 = 9
 
So, the answer is: 9

Great job! Let's try something a bit more complex:

messages = create_messages(
    """
Calculate the answer:
2 ** 4 / PI ** e = ?
"""
)
 
output = llm(messages)
A nice mathematical expression!
 
To evaluate this expression, we need to follow the order of operations (PEMDAS):
 
1. Exponentiation: `2 ** 4` = 16
2. Division: `16 / PI` = approximately 5.104
3. Multiplication: `5.104 * e` = approximately 13.517
 
So, the final answer is:
 
`2 ** 4 / PI ** e` ≈ 13.517

Nope, this is wrong. For some reason it didn't get the exponentiation of e. GPT-4 gave the response: 0.7124.

Let's try a "word" problem:

messages = create_messages(
    """
Calculate the answer:
What date is 1 month and 320 days from 28 April 2024 = ?
"""
)
 
output = llm(messages)
To calculate the date, we need to add 1 month and 320 days to 28 April 2024.
 
First, let's add 1 month:
 
28 April 2024 + 1 month = 28 May 2024
 
Now, let's add 320 days to 28 May 2024:
 
28 May 2024 + 320 days = 24 October 2024
 
So, the answer is: 24 October 2024

Obviously, the response is wrong. The correct answer is April 13, 2025 (according to GPT-4). Actually, the response is not even close.

Coding

Math is hard, but coding is even harder. Let's see how Llama 3 does with some coding tasks:

messages = create_messages(
    """
Write a function in python that calculates the square of a sum of two numbers.
"""
)
 
output = llm(messages)
Llama 3 Output (Trimmed)
def square_sum(a, b):
    return (a + b) ** 2

Looks good, simple and effective. Let's try something a bit more complex:

messages = create_messages(
    """
Write a function in python that splits a list into 3 equal parts and returns a list
with a random element of each sublist.
"""
)
 
output = llm(messages)

Here's the code snippet by the model:

Llama 3 Output (Trimmed)
import random
 
 
def random_elements(lst):
    # Split the list into 3 equal parts
    parts = [lst[i::3] for i in range(3)]
    print(parts)
    # Choose a random element from each part
    random_elements = [random.choice(part) for part in parts]
    return random_elements
 
random_elements([1, 2, 3, 4, 5, 6, 7, 8, 9])
[[1, 4, 7], [2, 5, 8], [3, 6, 9]]
[1, 2, 9]

Not quite what I had in mind, but one can argue that the definition of "split a list into 3 equal parts" is a bit ambiguous.

Finally, we can try something more practical:

messages = create_messages(
    """
Write a function that fetches the daily prices of Tesla stock for the last week
"""
)
 
output = llm(messages)

Here's the code snippet by the model:

Llama 3 Output (Trimmed)
import requests
import pandas as pd
 
def get_tesla_stock_prices():
    api_key = "YOUR_API_KEY_HERE"  # replace with your Alpha Vantage API key
    stock_symbol = "TSLA"
    interval = "daily"
    output_size = "full"
    from_date = (pd.Timestamp.now() - pd.Timedelta(days=7)).strftime("%Y-%m-%d")
 
    url = f"https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol={stock_symbol}&interval={interval}&outputsize={output_size}&from_date={from_date}&apikey={api_key}"
    response = requests.get(url)
 
    if response.status_code == 200:
        data = response.json()
        df = pd.DataFrame(data["Time Series (Daily)"]).T
        df.index = pd.to_datetime(df.index)
        return df
    else:
        return None

This looks alright, but I am not really familiar with the Alpha Vantage API. Also, you'll need an API key to use this function.

Text Analysis

Another common task is text analysis. Let's see how Llama 3 does with sentiment analysis and rewriting a tweet in the style of Marcus Aurelius:

tweet = """
I hope that even my worst critics remain on Twitter,
because that is what free speech means
- Elon Musk
"""
 
messages = create_messages(
    f"""
What is the meaning of this tweet? Do sentiment analysis.
Rewrite it in the words of Marcus Aurelius (format as easy to read tweet).
 
{tweet}
 
"""
)
 
output = llm(messages)
**Meaning:** Elon Musk is expressing his support for free speech, specifically
on Twitter. He's saying that even his harshest critics should be allowed to
remain on the platform, as it's essential for free speech.
 
**Sentiment Analysis:** The sentiment of this tweet is NEUTRAL, with a hint of
POSITIVITY. Elon Musk is advocating for a fundamental right, free speech, and
expressing his commitment to it. The tone is calm and rational, without any
emotional undertones.
 
**Rewritten in the style of Marcus Aurelius:**
 
"May the voices of my detractors continue to resound on the digital agora, for
it is in the free exchange of ideas that we find wisdom. Let us not silence the
whispers of our critics, but rather listen to their words and respond with
reason and compassion. For in the pursuit of truth, even the most discordant
voices hold a place."

Wow, this is impressive! The sentiment analysis is spot on and the rewrite is quite good. I am not sure if Marcus Aurelius would use the term "digital agora", but it's a nice touch.

Finally, let's try to extract data from a markdown table:

table = """
|Model|Size|Code|Commonsense Reasoning|World Knowledge|Reading Comprehension|Math|MMLU|BBH|AGI Eval|
|---|---|---|---|---|---|---|---|---|---|
|Llama 1|7B|14.1|60.8|46.2|58.5|6.95|35.1|30.3|23.9|
|Llama 1|13B|18.9|66.1|52.6|62.3|10.9|46.9|37.0|33.9|
|Llama 1|33B|26.0|70.0|58.4|67.6|21.4|57.8|39.8|41.7|
|Llama 1|65B|30.7|70.7|60.5|68.6|30.8|63.4|43.5|47.6|
|Llama 2|7B|16.8|63.9|48.9|61.3|14.6|45.3|32.6|29.3|
|Llama 2|13B|24.5|66.9|55.4|65.8|28.7|54.8|39.4|39.1|
|Llama 2|70B|**37.5**|**71.9**|**63.6**|**69.4**|**35.2**|**68.9**|**51.2**|**54.2**|
"""
 
messages = create_messages(
    f"""
Use the data from the markdown table:
 
{table}
 
to answer the question:
Extract the Reading Comprehension score for Llama 2 7B
"""
)
 
output = llm(messages)

Here's the table for reference:

ModelSizeCodeCommonsense ReasoningWorld KnowledgeReading ComprehensionMathMMLUBBHAGI Eval
Llama 17B14.160.846.258.56.9535.130.323.9
Llama 113B18.966.152.662.310.946.937.033.9
Llama 133B26.070.058.467.621.457.839.841.7
Llama 165B30.770.760.568.630.863.443.547.6
Llama 27B16.863.948.961.314.645.332.629.3
Llama 213B24.566.955.465.828.754.839.439.1
Llama 270B37.571.963.669.435.268.951.254.2
According to the table, the Reading Comprehension score for Llama 2 7B is 61.3.

Perfect response! The model extracted the correct value from the table.

Conclusion

At a first glance, the responses of the 8B model are quite good! Of course, the model is not perfect, but for a model you can run on a laptop - pretty amazing! Go ahead, try it out for yourself!

3,000+ people already joined

Join the The State of AI Newsletter

Every week, receive a curated collection of cutting-edge AI developments, practical tutorials, and analysis, empowering you to stay ahead in the rapidly evolving field of AI.

I won't send you any spam, ever!

References

Footnotes

  1. Llama 3 by Meta (opens in a new tab)

  2. LMSYS Chatbot Arena Leaderboard (opens in a new tab)

  3. Llama 3 8B Instruct (opens in a new tab)