Steps

docs

  1. Define your task
    1. Expected Input/Output behavior
      1. often helpful to come up with 3-4 examples of the inputs and outputs of your program
      2. consider quality and cost specs
  2. Define your pipeline
    1. What should your DSPy program do?
      1. can it just be a simple chain of thought step?
      2. do you need the LM to use retrieval?
      3. is there a typical workflow for solving your problem in multiple well-defined steps
      4. do you want a fully open-ended LM for your task?
    2. Almost every ask should probably start with a just a single dspy.ChainOfThought module
      1. and add complexity incrementally as you go
  3. Explore a few examples
    1. DSPy will help you optimize the instructions, few shot examples, and even weights of your LM calls below, but understanding where things go wrong in zero-shot usage will go a long way.
    2. Record the interesting (both easy and hard) examples you try: even if you don’t have labels, simply tracking the inputs you tried will be useful for DSPy optimizers.
  4. Define your data
    1. More formally declare your training and validation data for DSPy evaluation and optimization.
    2. You can use DSPy optimizers usefully with as few as 10 examples, but having 50-100 examples is better.
    3. Can almost always find somewhat adjacent datasets, e.g. HuggingFace
  5. Define your metric
    1. What makes outputs from your system good or bad? Invest in defining metrics and improving them over time incrementally. It’s really hard to consistently improve what you aren’t able to define.
    2. A metric is just a function that will take examples from your data and take the output of your system, and return a score that quantifies how good the output is.
    3. For simple tasks, this could just be “accuracy”, “exact match”, or “F1” score.
    4. For most applications, your system will output long-form outputs.
    5. There, your metric should probably be a smaller DSPy program that checks multiple properties of the output (quite possibly using AI feedback from LMs).
    6. Getting this right on the first try is unlikely, but you should start with something simple and iterate.
    7. If your metric is itself a DSPy program, notice that one of the most powerful ways to iterate is to compile, optimize your metric itself.
    8. That’s usually easy because the output of the metric is usually a simple value so the metric’s metric is easy to define and optimize by collecting a few examples.
  6. Collect preliminary “zero-shot” evaluations
    1. Now that you have some data and a metric, run evaluation on your pipeline before any optimizer runs.
    2. Look at the outputs and the metric scores. This will probably allow you to spot any major issues, and it will define a baseline for your next step.
  7. Compile with DSPy optimizer
    1. Given some data and a metric, we can now optimize the program built.
    2. DSPy includes many optimizers, that do different things. DSPy optimizers will create examples of easy step, craft instructions, and/or update LM weights.
    3. In general, you don’t need to have labels for your pipeline steps, but your data examples need to have input values and whatever labels your metric requires (e.g. no labels if your metric is reference-free, but final output labels otherwise in most cases)
      1. General Guidance:
        1. if you have very little data, e.g. 10 examples of your task, use BootstrapFewShot
        2. if you have slightly more data (50-100 examples), use BootstrapFewShotWithRandomSearch
        3. if ~300 examples, BayesianSignatureOptimizer
        4. If you have been able to use one or these with large LM, and need a very efficient program, compile that down to a small LM with BootstrapFinetune.
  8. Iterate (questions to ask)
    1. Did you define your task well?
    2. Do you need to collection (or find online) more data for your problem?
    3. Do you want to update your metric?
    4. And do you want to use a more sophisticated optimizer?
    5. Do you need to consider advanced features like DSPy Assertions?
    6. Or perhaps most importantly, do you want to add more complexity or steps in your DSPy program?
    7. Do you want to use multiple optimizers in a sequence?
    8. Iterative development is key, DSPy gives you the pieces to do that incrementally: iterate on your data, your program structure, your assertions, your metric, and your optimization steps.
    9. Optimizing complex LM programs is an entirely new paradigm that only exists in DSPy at the time of writing, so naturally the norms around what to do are still emerging.

Language Models

The most powerful feature in DSPy revolve around algorithmically optimizing the prompts (or weights) of LMs, especially when you’re building programs that use the LMs within a pipeline.

In DSPy, all LM calls are cached. If you repeat the same call, you will get the same outputs. If you change the inputs or configurations, you will get new outputs.

Can specify how many outputs to generate using the n parameter.


Signatures

When we assign tasks to LMs in DSPy, we specify the behavior we need as a Signature

A signature is a declarative specification of input/output behavior of a DSPy module.