Dark Data

The Secret Rocket Fuel of AI

By Tony Burlinson

Companies have mountains of data outside of their core structured databases. If AI could access that ‘dark data’ it would transform how a business operates and makes decisions.

Almost every firm has decades of unstructured content that remains untouched, unclassified, and unanalyzed. The list is endless: emails, logs, dashboards, calendars, meeting notes, call transcripts, images, PDFs, spreadsheets, Word documents, IM chats, and text messages.

And that’s just scratching the surface.

For years, the complexity and cost of integrating these vast pools of unstructured dark data into a company’s data ecosystem simply couldn’t be justified.

AI fundamentally changes that value proposition.

In the age of AI, dark data can be processed, analyzed, tagged with metadata, and then connected at speed, providing unique insights and leading to a competitive advantage.

AI systems are only as strong as the data they can learn from. When organizations rely solely on their legacy structured and curated databases, they are limiting the intelligence and contextual depth of their AI models.

Dark data contains the institutional memory of the enterprise. These are the critical nuances, undocumented exceptions and tribal knowledge that structured data systems simply cannot capture.

Iceberg

Unlocking a corporation’s dark data allows AI to move beyond generic responses that apply to all companies and instead deliver unique insights into how a particular business really works in the real world. That includes how suppliers and clients behave also.

AI hallucinations occur when models lack grounding in the real world. By indexing and connecting dark data through retrieval pipelines and vector databases, organizations give AI access to real and meaningful knowledge.

Microsoft is obviously in pole position to help firms start to access some of their dark data. Firm’s already have the obvious first chunks dark data in Microsoft’s cloud. However, this starts to highlight the governance minefield around dark data.

Let’s imagine a firm started ingesting Instant Messaging dark data into its proprietary AI platform. All the IMs that employees might have assumed were private (they aren’t) can now be read by the firm’s AI model. It now knows which managers are liked and which are disliked, and what employees really think about their firm’s leadership.

While that might not be a bad thing, it also raises a myriad of legal and ethical questions. Not to mention employees will probably start using chat very differently if the governance model isn’t clear and transparent.

A detailed and well thought out AI governance model needs to be in place before companies start loading dark data into their AI models. AI Governance models are not created overnight, and they will then need ongoing maintenance. (New job created by AI, anyone?)

Even considering the effort to create and maintain governance models for dark data, firms are still going to achieve a positive ROI. There are enormous business benefits from a firm’s proprietary AI model understanding how a firm really delivers business outcomes.

Companies that embed dark data into their proprietary AI models will be able to deliver substantially better customer experiences and accelerate decision making, by many multiples.

Firms that ignore their dark data are falling into the trap of not appreciating that AI has busted legacy ROI models wide open.

In its simplest form, the promise of AI is the ability to have AI perform the equivalent work of entire teams of employees but do it twenty-four-seven and in a fraction of the time.

That baseline promise is going to challenge existing business models and workflows that have been designed on the centuries long premise that output is limited by the number of human employees a company can hire.

Share: LinkedIn