The Case for Synthetic Data AI

It’s low cost and maybe even lower quality. But synthetic data could stake a big claim via artificial intelligence.

While artificial intelligence is growing exponentially, the cost of running AI data sets is soaring.

On the upside, AI is shifting into higher gear usage-wise. A recent IDC study states that global AI revenue will grow from $154 billion in 2023 to $300 billion by 2026.

Unfortunately for chief financial officers and the companies they serve, the cost of leveraging AI data is on the rise, too. A separate study pegging AI data server costs to crest $76 billion by 2028.

The Rise of Synthetic Data

Now, a new conventional wisdom is emerging among AI analysts: Synthetic data could curb the costs of operational AI data.

That’s the call from Alexander Linden, a senior analyst at Gartner. In a new Gartner Q&A, the company notes that synthetic data “will completely overshadow real data in AI models by 2030.

In the Q&A, Linden explains synthetic data and why it’s so important to the future of AI.

Synthetic data is a class of data that is artificially generated. It contrasts with real data, directly observed in the real world. While real data is almost always the best source of insights from data, real data is often expensive, imbalanced, unavailable, or unusable due to privacy regulations.

Synthetic data can effectively supplement or alternative to real data, providing access to better-annotated data to build accurate, extensible AI models. When combined with real data, synthetic data creates an enhanced dataset that can often mitigate the weaknesses of the real data.


With these attributes, synthetic data can thrive in the artificial intelligence sector, Linden says.

Organizations can use synthetic data to test a new system where no live data exists or when data is biased. They can also use synthetic data to supplement small, existing datasets that are currently being ignored. Alternatively, they choose synthetic data when real data can’t be used, can’t be shared, or can’t be moved. In that sense, synthetic data is one further AI enabler.

Companies can leverage that scenario by “injecting” synthetic data into AI and getting robust outcomes artificially. That’s a win-win, Linden said.

There are many other forms of synthetic data, such as data augmentation or pseudonymization/anonymization, which are further types of “data synthesis.” Those methods are a must-have in any modern data science team. But, with synthetic data, professionals inject information into their AI models and obtain artificially generated data that is more valuable than direct observation.

The Risks Aren’t Artificial

While the scope and breadth of synthetic AI’s applicability will make it “a critical accelerator” for AI, Linden said there are risks associated with its use.

Using synthetic data requires additional verification steps, such as the comparison of model results with human-annotated, real-world data, to ensure the fidelity of results. In addition, synthetic data may be misleading and can lead to inferior results, and synthetic data may not be 100% fail-safe when it comes to privacy.

Because of these technological challenges, user skepticism might also be another difficult challenge for synthetic data to overcome, as users may perceive it as “inferior” or “fake” data.

With AI data costs skyrocketing and corporate budgets pinched in a tough economy, expect more questions and answers from the C-suite on synthetic AI going forward.

For now, however, the prospect of AI becoming more artificial is not only on the table, it’s coming soon to a data marketplace near you.



Recent Posts

Leave a Reply

Your email address will not be published. Required fields are marked *