30 November 2023

Going Big: The Story Of Open Dat

Ivan Shkvarun

To anticipate the future, we need to make sense of the past. And while the implications that open data hold for us today are wide-reaching, they also emerge from a logical evolution that I have been closely connected to throughout my professional life. This is an account of that story through the lens of my career experiences and reflections on the subject, which span the past, present and future, as well as the corporate world and the individual.

Beginnings

It was 2004, and the era of social media was yet to appear. I was a fresh-faced student just starting my third year at university when a job opportunity arose—I was invited to join the information security team of a large enterprise company.

Taking up my new post, I was confronted by a pretty formidable task. Could I create a system capable of gathering and analyzing data from various sources and storing it in an orderly and accessible way for future use? There was no existing model to work from; I had to generate something from scratch by coming up with creative solutions to the challenges that the unstructured data posed.

Setting about the task, my major considerations were twofold. Firstly, the system needed to take into account the skill level of the end users in the analytics team. Secondly, it would also have to accommodate the analysts’ demands, allowing them to look into data extracted from a range of sources. The overall goal of all this was clear—to mitigate the inherent risks the company faced through its business dealings, especially B2C collaborations.

In the course of establishing the new system, I devoted a huge amount of time to researching companies and their personnel, digging up any information of interest to gain valuable insights and identify red flags. This work also gave me the chance to start integrating automation methods and accelerating the processes. However, the sphere of data in those days was hugely underdeveloped compared to what it’s like today.

I quickly found that—as illustrated by Warren Buffet’s comments concerning the elusive first million—the circumstances surrounding the rise of a given company are often extremely hazy. With a lot of data being simply unavailable online, in many cases, we were obliged to seek out individuals external to these companies with inside knowledge. It was long-winded.
Web 2.0: The Open Data Explosion

We know what happened next. The colossal rise of social media—or transition to the so-called Web 2.0—brought about a sea change in the evolution of open data as a research resource.

As data started moving from private databases to the public online space, the discipline of OSINT (Open-Source Intelligence) began to take off, moving from a niche specialist practice to a fully-fledged industry spanning a range of spheres. As time passed, this shift became so smooth, and the quantity of data so vast that the repositories of this information naturally picked up an appellation—open sources.

And once the ball got rolling, it just kept gaining momentum, with more and more people pouring their data into the online space. Then, as the internet became flooded with an ever-increasing amount of disparate information, the concept of Big Data—originally intended for different purposes—began overflowing into parallel spheres.

So, these disciplines have gradually become more merged, as have their attendant opportunities and challenges—How can we integrate diverse forms of data? Where is it all located, and how is it accessed? How are data points interconnected, and what are they composed of? And finally, how can data be cross-checked across different sources to provide valuable insights for the researcher?

These were—and still are—pertinent questions, which naturally began to drive the tech being developed. And by the same token, the interdisciplinary nature of these concepts is also being reflected in the sphere of development—the tools designed for carrying out the one are now broadly equipped for dealing with the other.
Today And Tomorrow

Our current paradigm is an amalgam of data disciplines, which seem to increasingly reinforce one another. At the same time, the data landscape itself has become so immense and complex that gaining utility from this resource is now a highly demanding task.

As a result, we are seeing a disconnect where individuals and organizations understand the value of open data but are extremely hard-pushed to realize its potential. Open data has an application that far outstrips the spheres of IT and cybersecurity, yet it is specialists from these disciplines that tend to have the know-how and skills to translate unstructured open data into structured serviceable intelligence.

Stepping stones are emerging in the form of data analysis solutions, with automation features powered by exciting and tangible developments in neural networks, large language models, and so on. But where such solutions really come into their own is through versatile, intuitive operability, allowing users of all levels to harvest the fruits of the open data wilderness.

On top of this, it should be remembered that while open data is of undoubted value, it doesn’t eclipse the need for its counterpart—closed data. Perhaps the next stage in data as a resource lies at an intersection between these branches—in a more elementary vision of information, where interconnections permeate data subject, format, location and ontology. This is the area I feel may need to be explored—one which, in my opinion, can yield new discoveries and innovations.

But ultimately, shifts in the data landscape could have a far-reaching effect for all, from giant corporations right down to the individual. In a world of informational abundance and proliferation, knowing how to work with various data types looks set to become an indispensable skill. And with the right tools, we have the chance to harness the winds of change and avoid drifting into the doldrums.

No comments: