technologyneutral
AI’s Next Step: Synthetic Data
Thursday, January 9, 2025
Companies like Microsoft, Meta, and Google are already using synthetic data. Microsoft's Phi-4 model and Google's Gemma models were trained using both real and synthetic data. Even Meta’s Llama models were fine-tuned with AI-generated data.
Synthetic data offers big benefits, like cost savings. Writer’s Palmyra X 004 model, built mostly with synthetic data, cost only $700, 000 compared to the millions some other models require.
However, synthetic data has downsides too. It can lead to model collapse, making the AI less creative and more biased. If the training data has biases, the AI’s outputs will too.
This shift towards synthetic data is exciting but comes with challenges. As AI starts to create its own learning material, it’s crucial to ensure that the data is unbiased and effective.
Actions
flag content