ETMI5: Explain to Me in 5

This part of our course content covers the intricacies of prompting and prompt engineering for LLMs. Prompting is the technique of crafting precise instructions to elicit specific responses from LLMs, crucial for their effective use.

Prompt engineering is an evolving discipline aimed at optimizing these prompts to enhance model performance across various tasks. The importance of prompting lies in its ability to guide LLMs towards producing contextually appropriate and accurate outputs, leveraging their training and understanding of language patterns.

The content also touches on the challenges, including potential biases and the risk of hallucination, and the need for techniques to detect and mitigate such issues.

Additionally, we briefly go over tools and advanced methods developed for prompt engineering, underscoring the dynamic and collaborative nature of this field in harnessing LLM capabilities.

Introduction

Prompting

"prompting" refers to the art and science of formulating precise instructions or queries provided to the model to generate desired outputs. It's the input—typically in the form of text—that users present to the language model to elicit specific responses. The effectiveness of a prompt lies in its ability to guide the model's understanding and generate outputs aligned with user expectations.

Prompt Engineering

Untitled

                Image Source: [<https://zapier.com/blog/prompt-engineering/>](<https://zapier.com/blog/prompt-engineering/>)

Why Prompting?

Large language models are trained through a process called unsupervised learning on vast amounts of diverse text data. During training, the model learns to predict the next word in a sentence based on the context provided by the preceding words. This process allows the model to capture grammar, facts, reasoning abilities, and even some aspects of common sense.

Prompting is a crucial aspect of using these models effectively. Here's why prompting LLMs the right way is essential:

  1. Contextual Understanding: LLMs are trained to understand context and generate responses based on the patterns learned from diverse text data. When you provide a prompt, it's crucial to structure it in a way that aligns with the context the model is familiar with. This helps the model make relevant associations and produce coherent responses.
  2. Training Data Patterns: During training, the model learns from a wide range of text, capturing the linguistic nuances and patterns present in the data. Effective prompts leverage this training by incorporating similar language and structures that the model has encountered in its training data. This enables the model to generate responses that are consistent with its learned patterns.
  3. Transfer Learning: LLMs utilize transfer learning. The knowledge gained during training on diverse datasets is transferred to the task at hand when prompted. A well-crafted prompt acts as a bridge, connecting the general knowledge acquired during training to the specific information or action desired by the user.
  4. Contextual Prompts for Contextual Responses: By using prompts that resemble the language and context the model was trained on, users tap into the model's ability to understand and generate content within similar contexts. This leads to more accurate and contextually appropriate responses.
  5. Mitigating Bias: The model may inherit biases present in its training data. Thoughtful prompts can help mitigate bias by providing additional context or framing questions in a way that encourages unbiased responses. This is crucial for aligning model outputs with ethical standards.

To summarize, the training of LLMs involves learning from massive datasets, and prompting is the means by which users guide these models to produce useful, relevant, and policy-compliant responses. It's a collaborative process where users and models work together to achieve the desired outcome. There’s also a growing field called adversarial prompting which involves intentionally crafting prompts to exploit weaknesses or biases in a language model, with the goal of generating responses that may be misleading, inappropriate, or showcase the model's limitations. Safeguarding models from providing harmful responses is a challenge that needs to be solved and is an active research area.

Prompting Basics

The basic principles of prompting involve the inclusion of specific elements tailored to the task at hand. These elements include:

  1. Instruction: Clearly specify the task or action you want the model to perform. This sets the context for the model's response and guides its behavior.
  2. Context: Provide external information or additional context that helps the model better understand the task and generate more accurate responses. Context can be crucial in steering the model towards the desired outcome.
  3. Input Data: Include the input or question for which you seek a response. This is the information on which you want the model to act or provide insights.
  4. Output Indicator: Define the type or format of the desired output. This guides the model in presenting the information in a way that aligns with your expectations.

Here's an example prompt for a text classification task:

Prompt:

Classify the text into neutral, negative, or positive
Text: I think the food was okay.
Sentiment:

In this example:

Note that this example doesn't explicitly use context, but context can also be incorporated into the prompt to provide additional information that aids the model in understanding the task better.

It's important to highlight that not all four elements are always necessary for a prompt, and the format can vary based on the specific task. The key is to structure prompts in a way that effectively communicates the user's intent and guides the model to produce relevant and accurate responses.

OpenAI has recently provided guidelines on best practices for prompt engineering using the OpenAI API. For a detailed understanding, you can explore the guidelines here, the below points gives a brief summary:

  1. Use the Latest Model: For optimal results, it is recommended to use the latest and most capable models.
  2. Structure Instructions: Place instructions at the beginning of the prompt and use ### or """ to separate the instruction and context for clarity and effectiveness.
  3. Be Specific and Descriptive: Clearly articulate the desired context, outcome, length, format, style, etc., in a specific and detailed manner.
  4. Specify Output Format with Examples: Clearly express the desired output format through examples, making it easier for the model to understand and respond accurately.
  5. Use Zero-shot, Few-shot, and Fine-tune Approach: Begin with a zero-shot approach, followed by a few-shot approach (providing examples). If neither works, consider fine-tuning the model.
  6. Avoid Fluffy Descriptions: Reduce vague and imprecise descriptions. Instead, use clear instructions and avoid unnecessary verbosity.
  7. Provide Positive Guidance: Instead of stating what not to do, clearly state what actions should be taken in a given situation, offering positive guidance.
  8. Code Generation Specific - Use "Leading Words": When generating code, utilize "leading words" to guide the model toward a specific pattern or language, improving the accuracy of code generation.

💡It’s also important to note that crafting effective prompts is an iterative process, and you may need to experiment to find the most suitable approach for your specific use case. Prompt patterns may be specific to models and how they were trained (architecture, datasets used etc.)

Explore these examples of prompts to gain a better understanding of how to craft effective prompts in different use-cases.

Advanced Prompting Techniques

Prompting techniques constitute a rapidly evolving area of research, with researchers continually exploring novel methods to effectively prompt models for optimal performance. The simplest forms of prompting include zero-shot, where only instructions are provided, and few-shot, where examples are given, and the language model (LLM) is tasked with replication. More intricate techniques are elucidated in various research papers. While the provided list is not exhaustive, existing prompting methods can be tentatively classified into high-level categories. It's crucial to note that these classes are derived from current techniques and are not exhaustive or definitive; they are subject to evolution and modification, reflecting the dynamic nature of advancements in this field. It's important to highlight that numerous methods may fall into one or more of these classes, exhibiting overlapping characteristics to get the benefits offered by multiple categories.

Applied LLMs (11).png

A. Step**-by-Step Modular Decomposition**

These methods involve breaking down complex problems into smaller, manageable steps, facilitating a structured approach to problem-solving. These methods guide the LLM through a sequence of intermediate steps, allowing it to focus on solving one step at a time rather than tackling the entire problem in a single step. This approach enhances the reasoning abilities of LLMs and is particularly useful for tasks requiring multi-step thinking.

Examples of methods falling under this category include:

  1. Chain-of-Thought (CoT) Prompting:

Chain-of-Thought (CoT) Prompting is a technique to enhance complex reasoning capabilities through intermediate reasoning steps. This method involves providing a sequence of reasoning steps that guide a large language model (LLM) through a problem, allowing it to focus on solving one step at a time.

In the provided example below, the prompt involves evaluating whether the sum of odd numbers in a given group is an even number. The LLM is guided to reason through each example step by step, providing intermediate reasoning before arriving at the final answer. The output shows that the model successfully solves the problem by considering the odd numbers and their sums.

Untitled

                                      Image Source: [Wei et al. (2022)](<https://arxiv.org/abs/2201.11903>)

1a. Zero-shot/Few-Shot CoT Prompting:

Zero-shot involves adding the prompt "Let's think step by step" to the original question to guide the LLM through a systematic reasoning process. Few-shot prompting provides the model with a few examples of similar problems to enhance reasoning abilities. These CoT methods prompt significantly improves the model's performance by explicitly instructing it to think through the problem step by step. In contrast, without the special prompt, the model fails to provide the correct answer.

Untitled

                                    Image Source: [Kojima et al. (2022)](<https://arxiv.org/abs/2205.11916>)

1b. Automatic Chain-of-Thought (Auto-CoT):

Automatic Chain-of-Thought (Auto-CoT) was designed to automate the generation of reasoning chains for demonstrations. Instead of manually crafting examples, Auto-CoT leverages LLMs with a "Let's think step by step" prompt to automatically generate reasoning chains one by one.

Untitled

                             Image Source: [Zhang et al. (2022)](<https://arxiv.org/abs/2210.03493>)

The Auto-CoT process involves two main stages:

  1. Question Clustering: Partition questions into clusters based on similarity.
  2. Demonstration Sampling: Select a representative question from each cluster and generate its reasoning chain using Zero-Shot-CoT with simple heuristics.

The goal is to eliminate manual efforts in creating diverse and effective examples. Auto-CoT ensures diversity in demonstrations, and the heuristic-based approach encourages the model to generate simple yet accurate reasoning chains.

Overall, these CoT prompting techniques showcase the effectiveness of guiding LLMs through step-by-step reasoning for improved problem-solving and demonstration generation.

  1. Tree-of-Thoughts (ToT) Prompting

Tree-of-Thoughts (ToT) Prompting is a technique that extends the Chain-of-Thought approach. It allows language models to explore coherent units of text ("thoughts") as intermediate steps towards problem-solving. ToT enables models to make deliberate decisions, consider multiple reasoning paths, and self-evaluate choices. It introduces a structured framework where models can look ahead or backtrack as needed during the reasoning process. ToT Prompting provides a more structured and dynamic approach to reasoning, allowing language models to navigate complex problems with greater flexibility and strategic decision-making. It is particularly beneficial for tasks that require comprehensive and adaptive reasoning capabilities.

Key Characteristics:

Untitled

                                        Image Source: [Yao et el. (2023)](<https://arxiv.org/abs/2305.10601>)
  1. Graph of Thought Prompting

This work arises from the fact that human thought processes often follow non-linear patterns, deviating from simple sequential chains. In response, the authors propose Graph-of-Thought (GoT) reasoning, a novel approach that models thoughts not just as chains but as graphs, capturing the intricacies of non-sequential thinking.

This extension introduces a paradigm shift in representing thought units. Nodes in the graph symbolize these thought units, and edges depict connections, presenting a more realistic portrayal of the complexities inherent in human cognition. Unlike traditional trees, GoT employs Directed Acyclic Graphs (DAGs), allowing the modeling of paths that fork and converge. This divergence provides GoT with a significant advantage over conventional linear approaches.

The GoT reasoning model operates in a two-stage framework. Initially, it generates rationales, and subsequently, it produces the final answer. To facilitate this, the model leverages a Graph-of-Thoughts encoder for representation learning. The integration of GoT representations with the original input occurs through a gated fusion mechanism, enabling the model to combine both linear and non-linear aspects of thought processes.

Untitled

                               Image Source: [Yao et el. (2023)](<https://arxiv.org/abs/2305.16582>)

B. Comprehensive Reasoning and Verification

Comprehensive Reasoning and Verification methods in prompting entail a more sophisticated approach where reasoning is not just confined to providing a final answer but involves generating detailed intermediate steps. The distinctive aspect of these techniques is the integration of a self-verification mechanism within the framework. As the LLM generates intermediate answers or reasoning traces, it autonomously verifies their consistency and correctness. If the internal verification yields a false result, the model iteratively refines its responses, ensuring that the generated reasoning aligns with the expected logical coherence. These checks contributes to a more robust and reliable reasoning process, allowing the model to adapt and refine its outputs based on internal validation

  1. Automatic Prompt Engineer

Automatic Prompt Engineer (APE) is a technique that treats instructions as programmable elements and seeks to optimize them by conducting a search across a pool of instruction candidates proposed by an LLM. Drawing inspiration from classical program synthesis and human prompt engineering, APE employs a scoring function to evaluate the effectiveness of candidate instructions. The selected instruction, determined by the highest score, is then utilized as the prompt for the LLM. This automated approach aims to enhance the efficiency of prompt generation, aligning with classical program synthesis principles and leveraging the knowledge embedded in large language models to improve overall performance in producing desired outputs.

Untitled

                                           Image Source: [Zhou et al., (2022)](<https://arxiv.org/abs/2211.01910>)
  1. Chain of Verification (CoVe)

The Chain-of-Verification (CoVe) method addresses the challenge of hallucination in large language models by introducing a systematic verification process. It begins with the model drafting an initial response to a user query, potentially containing inaccuracies. CoVe then plans and poses independent verification questions, aiming to fact-check the initial response without bias. The model answers these questions, and based on the verification outcomes, generates a final response, incorporating corrections and improvements identified through the verification process. CoVe ensures unbiased verification, leading to enhanced factual accuracy in the final response, and contributes to improved overall model performance by mitigating the generation of inaccurate information.

Untitled

                                  Image Source: [Dhuliawala et al.2023](<https://arxiv.org/abs/2309.11495>)
  1. Self Consistency

Self Consistency represents a refinement in prompt engineering, specifically targeting the limitations of naive greedy decoding in chain-of-thought prompting. The core concept involves sampling multiple diverse reasoning paths using few-shot CoT and leveraging the generated responses to identify the most consistent answer. This method aims to enhance the performance of CoT prompting, particularly in tasks that demand arithmetic and commonsense reasoning. By introducing diversity in reasoning paths and prioritizing consistency, Self Consistency contributes to more robust and accurate language model responses within the CoT framework.

Screenshot 2024-01-14 at 3.50.46 PM.png

                                        Image Source: [Wang et al. (2022)](<https://arxiv.org/pdf/2203.11171.pdf>)
  1. ReACT

The ReAct framework combines reasoning and action in LLMs to enhance their capabilities in dynamic tasks. The framework involves generating both verbal reasoning traces and task-specific actions in an interleaved manner. ReAct aims to address the limitations of models, like chain-of-thought , that lack access to the external world and can encounter issues such as fact hallucination and error propagation. Inspired by the synergy between "acting" and "reasoning" in human learning and decision-making, ReAct prompts LLMs to create, maintain, and adjust plans for acting dynamically. The model can interact with external environments, such as knowledge bases, to retrieve additional information, leading to more reliable and factual responses.

Screenshot 2024-01-14 at 3.53.32 PM.png

                                          Image Source: [Yao et al., 2022](<https://arxiv.org/abs/2210.03629>)

How ReAct Works:

  1. Dynamic Reasoning and Acting: ReAct generates both verbal reasoning traces and actions, allowing for dynamic reasoning in response to complex tasks.
  2. Interaction with External Environments: The action step enables interaction with external sources, like search engines or knowledge bases, to gather information and refine reasoning.
  3. Improved Task Performance: The framework's integration of reasoning and action contributes to outperforming state-of-the-art baselines on language and decision-making tasks.
  4. Enhanced Human Interpretability: ReAct leads to improved human interpretability and trustworthiness of LLMs, making their responses more understandable and reliable.

C. Usage of External Tools/Knowledge or Aggregation

This category of prompting methods encompasses techniques that leverage external sources, tools, or aggregated information to enhance the performance of LLMs. These methods recognize the importance of accessing external knowledge or tools for more informed and contextually rich responses. Aggregation techniques involve harnessing the power of multiple responses to enhance the robustness. This approach recognizes that diverse perspectives and reasoning paths can contribute to more reliable and comprehensive answers. Here's an overview:

  1. Active Prompting (Aggregation)

Active Prompting was designed to enhance the adaptability LLMs to various tasks by dynamically selecting task-specific example prompts. Chain-of-Thought methods typically rely on a fixed set of human-annotated exemplars, which may not always be the most effective for diverse tasks. Here's how Active Prompting addresses this challenge:

  1. Dynamic Querying:
  2. Uncertainty Metric:
  3. Selective Annotation:
  4. Adaptive Learning:

Active Prompting's dynamic adaptation mechanism enables LLMs to actively seek and incorporate task-specific examples that align with the challenges posed by different tasks. By leveraging human-annotated exemplars for uncertain cases, this approach contributes to a more contextually aware and effective performance across diverse tasks.