The big data revolution has given immense value to data once thought to be worthless, but this has not always been for the better, according to Solid contributor and professor of decentralised web technology at Ghent University, Ruben Verborgh.
In particular, the piping of the ‘new oil' into centralised proprietary repositories has resulted in a number of negative consequences. One is the creation of virtual monopolies - the Facebooks, Googles, Microsofts and Amazons - and the consequent stifling of innovation.
"Right now, who wins in this economy? It's the party with the most data. Is that the most innovative party? Absolutely not. They don't innovate because they don't have to," Verborgh argues.
"Let's say a new company has a revolutionary idea that can show you what's happening around you. Well, tough luck, they'll never make it because they don't have the data. This competition based on data harvesting kills innovation. We need competition based on quality of service, not on data harvesting," he said, speaking to Computing at the Connected Data London event.
Competition based on data harvesting kills innovation
Another major problem concerns privacy, how can we be sure who is accessing what in the data lake and where it's ending up? There are other issues too.
"It's problematic for legal reasons, for practical reason and for societal reasons, and you also start getting conflicts, like what if my personal truth difference from yours? If you put them together, you can get contradictions, and who's right and who's wrong? If I want to see a view of the world, I will pick the sources that I trust, and it'll be different from the ones that you trust, but they're both truth within their own context."
The idea behind Solid (for Social Linked Data), the project started by Tim Berners-Lee, is to decouple data from applications and allow individuals and companies to store data such as blogs, posts, contact details, emails, likes, comments and whatever else in their own encrypted ‘pods' (personal online data stores) hosted wherever they see fit. They can then grant or deny access to that data to applications that want to use it. Solid is a big part of Berners-Lee's efforts to 'redecentralise the web'.
"We challenge this notion that data has to be given to others In order to do services," Verborgh explained. "We keep data close to people. It makes more sense from a data perspective, because my data is about me, it makes more sense from a legal perspective because nowadays maintaining data is expensive because of GDPR, security and other things."
It also makes for easier version control and fewer collisions because there will be one master version of data out there, the one that resides in the individual's pod, and as a plus, the days of endlessly entering the same data into different forms should be over.
With the Solid model, applications become views over data rather than ingesters of it, the idea being that software companies will then compete on services provided rather than on data accrued. This depends on opening up the data silos and proprietary formats and making data interoperable.
"The kind of interoperability we're trying to realise is much, much broader [than we have today]. It's about making data flow between applications and companies and people in ways that have never been done before."
In addition, it gives much more control to data owners, he says.
"If you have a cool idea, guess what, I want to show you my data temporarily. And if I like it, I can continue. If I don't like it, I'll switch to something else."
Solid uses technologies of the semantic web such as Linked Open Data and RDF to join together small individual datasets held in pods, building a graph of relationships between data in a way that is easier for machines to navigate because the data no longer sits in proprietary silos, and with fewer privacy compromises. In addition, because the data is imbued with universally understood meaning, new insights should become available and new more individualised services become possible.
That's the theory anyway, and the start-up inrupt was created by Berners-Lee a year ago with the aim of making this vision commercially viable by working on design and enterprise features that are beyond the scope of the system's core developers. However, there remain some formidable technical challenges to overcome, one being latency: querying millions of small datasets inevitably takes longer than if all the data is in one place. Another is the user experience: Verborgh describes the current interface and UX as "terrible - it's built for developers, not for people."
We're aiming to show the world that you can make money with data in a healthy way
A few rudimentary applications are starting to appear for Solid, such as a file manager and editors and there are efforts being made for interoperability with other projects such as the SAFE Network, but it's clear there's still a way to go before the experience is anything like 'consumer grade'.
But while there are still rough edges aplenty, the fundamentals of the stack are now in place, Verborgh insists, adding that "Tim uses it to organise his whole life and I use it to organise parts of mine too."
As to when we will see something new from inrupt (to which Verborgh is a technology advocate), the company plays its cards close to its chest.
"We're at the proverbial last mile, by which I mean 100 or 1,000 miles," he said, a little cryptically.
"We're aiming to show the rest of the world that, yes, you can make money with data in a healthy way. You don't have to harvest data in order to do meaningful things. So inrupt is paving the way for a new relationship between people, companies and data."
Even greater reliance on data could be the greatest thing to emerge from the pandemic, say IT leaders
A panel of senior IT leaders hosted by Computing argue that the pandemic has broadened enterprise interest in data, but warn that care must be taken to present the right data and tools
Government tells NHS Digital to collect sensitive GP records - patients have until 23 June to opt out
medConfidential accuses DHSC and NHS Digital of sneaking intrusive new measures in under the cover of Covid
Engineering work to redesign the cloud has already started, according to the company
Findings likely to have a major impact on the online advertising industry