From the team

Learn more about the latest developments at Glaive

Update on Reflection-70B

Reproducing Reflection-70B benchmark scores and postmortem on what happened.

By Sahil Chaudhary

Dev Diary #15 - DuckDB and Independent Models

This month we have mostly been working on behind-the-scentes improvements like telemetry, refactoring, and performance improvements. However, we have also made some big changes to the dataset explorer and are working on making models independent of datasets.

Dev Diary #14 - Public Dataset Explorer & Phi

We have built a new site called explore.glaive.ai that allows you to browse published datasets and see the kinds of synthetic datasets Glaive can produce. We have also added a new base model called Phi Mini 128k.

Dev Diary #13 - Tutorial

We have built a little tutorial that should help you generate your first dataset and actually get to see some synthetic data. We have also finished the backend work to allow for publishing datasets and we are working on 'explore.glaive.ai' right now!

Dev Diary #12 - A fistful of fixes

On Tuesday, we opened up the platform for anyone to use! We're excited to see what you build with Glaive, and we're looking forward to hearing your feedback. The past month has been a lot of prep-work for a public launch including lot of stability and bug fixes.

Open Beta

Today, we are excited to announce the public beta for Glaive. Starting today you can sign up and use Glaive to build and improve language models.

Dev Diary #11 - Sources v2 and Logging API

This week, our ML team was hard at work on a behind-the-scenes project that we hope to share more on soon. In the meantime, we shipped an overhaul of the sources system that adds a suite of new functionality, as well as some quality of life & UI changes to improve the overall custom source experience. Additionally, our logging API now allows for bulk uploads, which will enable future features for measuring models.

Dev Diary #10 - Clusters and Playground v2

Last week, too many things were still in the oven so we skipped a diary, but now we have some hot and ready features. We finally have clustering out the door and alongside it we have a bunch of improvements to the playground.

Dev Diary #9 - Notifications and Logging API

We wanted to have clustering shipped this week, but unfortunately we ran into some issues, namely managing GPU resources. In general, managing our own GPUs has been more work than its worth so we have decided to a managed service. This will allow us to focus on the core features of the platform and not get bogged down in the details of managing a cluster.

Dev Diary #8 - Spring Cleaning

This week we completed a major reorganization & refactor of our backend, which will enable us to build faster going forward. We also added the ability to search + filter on datasets within the dataset explorer.

Dev Diary #7 - Exploring Versions

This week we have added the ability to explore the data in a dataset directly in the browser, and we have also been working on a queue system to enable some new features.

Dev Diary #6 - Llama3 and External Sources

This week we finally shipped an early version of external sources, which will let you optionally provide additional data to be used during synthetic generation. Additionally, Llama3 has been released, and we now support it as our second base model!

Dev Diary #5- Versions and Edits!

There was no Dev Diary last week because we were doing a lot of behind the scenes work, but we are back this week with a lot of new features and improvements.

Dev Diary #3 - Building Schemas to Control Your Datasets

Last week we showed off how we are going to let you design the perfect dataset for your use-case using the idea of a schema to describe what your generated data should look like. This week we talk about the schema builder, get into how these schemas will help make better datasets, and how we are thinking about complexity in datasets.

Dev Diary #2 - Designing Synthetic Datasets

The number one thing we hear from people is that they want more control over their datasets and models. Last week we gave some insight into what your data your model 'knows', but this week the focus is on controlling the process of generating that synthetic knowledge.

Dev Diary #1 - Welcome to Glaive!

Welcome to the inaugural Glaive Dev Diary! We are aiming to put these out weekly to give you a sense of what we accompished in the past week and what is coming up next.

Announcing our $3.5M seed round

Today, we are excited to announce that Glaive has raised a $3.5M seed round to move towards our goal of democratizing access to AI by enabling the use of small, use-case specific language models trained on synthetic data.