Back to blog

Dev Diary #10 - Clusters and Playground v2

2024-05-24

Last week, too many things were still in the oven so we skipped a diary, but now we have some hot and ready features. We finally have clustering out the door and alongside it we have a bunch of improvements to the playground. Next week, the plan is to show off some changes to sources and start introducing better ways to test and evaluate your models.

Clusters

Back in Dev Diary #8, we mentioned that clustering was coming soon but we didn't talk about it in detail. The goal of clustering is to give you a birds-eye view of your dataset, allowing you to see how your samples are distributed across the knowledge space of the use-case and provided sources. This can help you identify gaps in your dataset, or areas where your dataset is over-represented.

Here is what clusters can look like for a calculus tutor dataset

Clusters have also been integrated into the dataset explorer so you can easily filter and search for samples within a cluster. We are excited to see how you use this feature and what insights you gain from it.

Playground

The playground feature exists to help you quickly test & verify that your model is behaving as expected. However, for models with more complex schema structures, it can be difficult to manually type out well-formed prompts. The new schema input mode composes a friendly custom input field that matches the specific prompt schema of a model, allowing easy crafting of valid prompts.

We also shipped the ability to compare the outputs of different models side-by-side in the playground. This can be useful for comparing the performance of different models on the same prompt. For now this is limited to models who share the same schema and are in the same dataset.

Here is what the playground looks like with schema input mode

Changelog

  • Added clustering to the dataset explorer.
  • Added schema input mode to the playground.
  • Added the ability to compare the outputs of different models in the playground.
  • Added notifications for dataset and model failures.
  • Enabled notifications for teams.
  • Fixed a bunch of bugs related to dataset and model statuses.