Dev Diary #5- Versions and Edits!
2024-04-13
There was no Dev Diary last week because we were doing a lot of behind the scenes work, but we are back this week with a lot of new features and improvements.
Versions
Internally, we had to do a little bit of rearchitecting, which gave us the opportunity to rethink how we handle datasets and versions. We have now moved from a project -> dataset -> version structure to a dataset -> version structure. This change makes it easier to manage and understand what is happening with your datasets and models. You can now see the status of each version and model, and it is easier to navigate between them.
Editing Datasets
While most of our efforts were focused on letting you design great datasets, there wasn't any great way to iterate on them without simply making a brand new one. We have shipped an early version of dataset editing, which will let you iterate on your datasets and improve them over time. This is a big step towards making the platform more usable, and we are excited to see how you break use it.
Conditional Sample Edits
The first type of edit we have shipped is the ability to conditionally add or remove a number of samples from the dataset based on a condition. Say you have a classifier model that rates on a scale of 1-5 but it seems to never predict 3. You can now add more samples that should be rated 3 to the dataset to improve the model in that area.
Knowledge Edits
Another edit we have shipped is the ability for you to expand the knowledge a dataset will be generated from by providing additional keyphrases. For example, if you have a model that is supposed to parse code but it is struggling with a certain type of programming problem, you can add keyphrases that are related to that problem to the dataset to improve the model in that area.
This is all part of a larger effort to give you more control over where the data is generated from, and soon you will be able to specify not only keyphrases but also websites and even specific files.
Changelog
Web App
- Rearranged projects -> datatsets, datasets -> versions.
- Statuses of versions and models are now more visible.
- Added the ability to edit a dataset.
- Playground should now actually work.
Terms and Conditions
We now have terms and conditions plus a privacy policy. You can find them here and here.
Fixes
- Fixed playground using modelIds instead of names.
- Made it more obvious you can have multiline templates in the dataset designer.