Update on Reflection-70B
Reproducing Reflection-70B benchmark scores and postmortem on what happened.
By Sahil Chaudhary
Learn more about the latest developments at Glaive
Reproducing Reflection-70B benchmark scores and postmortem on what happened.
By Sahil Chaudhary
This month we have mostly been working on behind-the-scentes improvements like telemetry, refactoring, and performance improvements. However, we have also made some big changes to the dataset explorer and are working on making models independent of datasets.
We have built a new site called explore.glaive.ai that allows you to browse published datasets and see the kinds of synthetic datasets Glaive can produce. We have also added a new base model called Phi Mini 128k.
We have built a little tutorial that should help you generate your first dataset and actually get to see some synthetic data. We have also finished the backend work to allow for publishing datasets and we are working on 'explore.glaive.ai' right now!
On Tuesday, we opened up the platform for anyone to use! We're excited to see what you build with Glaive, and we're looking forward to hearing your feedback. The past month has been a lot of prep-work for a public launch including lot of stability and bug fixes.
Today, we are excited to announce the public beta for Glaive. Starting today you can sign up and use Glaive to build and improve language models.
This week, our ML team was hard at work on a behind-the-scenes project that we hope to share more on soon. In the meantime, we shipped an overhaul of the sources system that adds a suite of new functionality, as well as some quality of life & UI changes to improve the overall custom source experience. Additionally, our logging API now allows for bulk uploads, which will enable future features for measuring models.
Last week, too many things were still in the oven so we skipped a diary, but now we have some hot and ready features. We finally have clustering out the door and alongside it we have a bunch of improvements to the playground.
We wanted to have clustering shipped this week, but unfortunately we ran into some issues, namely managing GPU resources. In general, managing our own GPUs has been more work than its worth so we have decided to a managed service. This will allow us to focus on the core features of the platform and not get bogged down in the details of managing a cluster.
This week we completed a major reorganization & refactor of our backend, which will enable us to build faster going forward. We also added the ability to search + filter on datasets within the dataset explorer.
This week we have added the ability to explore the data in a dataset directly in the browser, and we have also been working on a queue system to enable some new features.
This week we finally shipped an early version of external sources, which will let you optionally provide additional data to be used during synthetic generation. Additionally, Llama3 has been released, and we now support it as our second base model!
There was no Dev Diary last week because we were doing a lot of behind the scenes work, but we are back this week with a lot of new features and improvements.
Last week we talked about the complexity of a dataset and how some use cases require higher complexity than others, this week we shipped complexity controls and also the ability to edit the knowledge graph with custom nodes.
Last week we showed off how we are going to let you design the perfect dataset for your use-case using the idea of a schema to describe what your generated data should look like. This week we talk about the schema builder, get into how these schemas will help make better datasets, and how we are thinking about complexity in datasets.
The number one thing we hear from people is that they want more control over their datasets and models. Last week we gave some insight into what your data your model 'knows', but this week the focus is on controlling the process of generating that synthetic knowledge.
Welcome to the inaugural Glaive Dev Diary! We are aiming to put these out weekly to give you a sense of what we accompished in the past week and what is coming up next.
Today, we are excited to announce that Glaive has raised a $3.5M seed round to move towards our goal of democratizing access to AI by enabling the use of small, use-case specific language models trained on synthetic data.