ChatGPT Image Dec 9, 2025, 10_08_05 AM

Unlocking Unstructured Data in the Public Sector

State, local, and education (SLED) organizations have been advancing for decades: moving from filing cabinets and typewritten records to early mainframes, then digitizing paper forms, building data warehouses, and getting dashboards into the hands of decision-makers. Each leap solved the problems of its moment and…

Share this post:

State, local, and education (SLED) organizations have been advancing for decades: moving from filing cabinets and typewritten records to early mainframes, then digitizing paper forms, building data warehouses, and getting dashboards into the hands of decision-makers. Each leap solved the problems of its moment and set up the next one.

Today, the next step is clear: start using the data that doesn’t live in rows and columns. The majority of what agencies can easily use today is tabular and structured, and that remains essential. But agencies also hold a treasure trove of PDFs and scanned forms, images and video, audio and call transcripts, logs and sensor feeds. Many analysts estimate that 80%–90% of newly generated enterprise data is unstructured, with similar figures echoed by Indico Data and Files.com, which aligns with what many public organizations are seeing in practice.

Bringing that data into view isn’t trendy – it’s the logical progression of modernization. It’s also how agencies will responsibly apply AI, improve services, and make better decisions with the context they already possess.

Why unstructured data matters

Unstructured data not only captures nuance but also contains information that simply can’t be expressed in tables – conversations, images, environmental signals, video, and free-form text. And it already connects to core SLED missions. 

Here’s just a few examples of how some public sector organizations are putting unstructured data to work today:

  • Public safety: body-worn camera video, license plate images, and 911 call audio to accelerate investigations and improve transparency.
  • Health & human services: caseworker notes, scanned applications, and provider documentation to spot eligibility issues or speed benefits decisions.
  • Education: student essays, lecture recordings, and transcripts to support tutoring, accessibility, and early-warning indicators.
  • Transportation & infrastructure: drone imagery, right-of-way photos, and sensor streams for maintenance and safety.
  • Courts & legal: scanned filings and hearing audio to improve search, disclosure, and public access.
  • Environmental & emergency response: satellite imagery, weather models, and incident reports to inform planning and response.

Why it’s been hard (and why that’s changing)

Many government data environments were designed around tables and scheduled reports. That’s not a flaw; it’s what the job required at the time. But the result is a patchwork of tools and storage patterns across departments that makes sharing new data types difficult. Add legitimate security and compliance responsibilities (HIPAA, CJIS, FERPA) and it’s easy to see why teams hesitate when governance models aren’t crystal clear.

The good news: it’s no longer a choice between risk and progress. Modern approaches let agencies manage unstructured and structured data together with policy-driven controls, lineage, auditability, and role-appropriate access, so data can be used and protected.

What “starting” can look like

Every organization’s path is different, but early wins often come from:

  • Document understanding: extracting key fields from PDFs and scanned forms (claims, applications, permits).
  • Audio/text analysis: summarizing call transcripts and case notes to surface themes, risks, or follow-ups.
  • Image/video workflows: classifying, tagging, or redacting sensitive content to speed reviews and improve privacy.
  • Sensor and log enrichment: correlating events across systems to reduce outages and improve response times.

None of this requires an overnight rebuild. Tacking simple, well-governed, and high-value use cases help agencies learn quickly and prove impact.

As technology evolves, so does the opportunity to use data more fully. Bringing unstructured information into your strategy is one way to keep modernization moving at a pace that works for your team and the communities you serve.

References

  1. Research World. Possibilities and limitations of unstructured data.
  2. Indico Data. Gartner report highlights the power of unstructured data analytics.
  3. Files.com. Unstructured data is exploding.

Last updated: December 9, 2025

png-transparent-databricks-logo-tech-companies

The first unified platform to bring the power of AI to your data and people, so you can deliver AI’s potential to every constituent.

Databricks is a leading data and artificial intelligence (AI) company, founded by the original creators of Apache Spark™, Delta Lake, and MLflow. Their mission is to simplify and democratize data and AI so that every organization can harness its full potential.

Learn More