Data Hygiene
- Avoid Overlap: Ensure your Fine-tune Dataset does not overlap with your Test Set. You want to train on one set of data and validate on a completely different set to ensure the model generalizes well.
- Curate for Quality: Remove incorrect, ambiguous, or low-value examples.
AI Review & Modification
- AI Review: Use high-capability models to systematically check your dataset for accuracy and consistency.
- Broad Modification: Perform bulk updates using Regex Keyword Replacement to instantly align thousands of responses with new requirements or brand guidelines across your entire dataset.
Augmentations
- Reasoning Augmentation: Teach your model to think step-by-step by attaching structured chain-of-thought reasoning to each training example. Learn more →
- Generate from Examples: Use your best examples as seeds to generate dozens of similar but unique training cases.
- Variation Injection: Automatically create variations in tone and phrasing to make your model more robust to different user styles.
Next
- Add structured reasoning to your data: Reasoning Augmentation
- Combine datasets into a training recipe: Compositions
- Full end-to-end walkthrough: Fine-tune a Model