Linked Data Networks: the Pragmatic Semantic Web

Nigel Shadbolt

Reported by: Sandi Evans & Jaclyn Selby

Nigel Shadbolt is Professor of Artificial Intelligence (AI) and Deputy Head (Research) of the School of Electronics and Computer Science at the University of Southampton. He was a Founding Director of the Web Science Research Initiative, a joint endeavour between the University of Southampton and MIT, and is a Founding Director and Trustee of the Web Science Trust. He is also a Director of the World Wide Web Foundation. His current research focuses on developing Web-Based Semantic Technologies.

Nigel Shadbolt spoke about the Semantic Web and the potential for research. The Semantic Web refers to emerging syntax-based architecture that enables the sharing of data on the Web. The Semantic Web is also referred as Linked Data and Web 3.0. The Semantic Web reflects a rich opportunity for researchers because of the potential for access to large amounts of data. Shadbolt also discussed Uniform Resource Identifiers (URI), a language used to represent information on the World Wide Web, and Resource Description Framework (RDF), a language connected with W3C.

In his lecture, Shadbolt noted that people developed a ‘romantic’ idea that the Semantic Web would be artificial intelligence (AI) ‘magic.’ It would create “proof and trust” but AI never “had a hope of that.” He felt that people had gotten sidetracked from the point, which is its great potential for information sharing. The Semantic Web, according to Shadbolt, is about moving from a web of documents to “a web of data.” He noted that all HTTP (hyper text transfer protocol) does is put “a thin layer of abstraction onto a hideous web of documents.” It creates physical connections between abstract machines. He cited Web addresses, domain name services, rooting systems and HTML (hyper text markup language) as examples of abstract protocols designed to “sit on top” of a variety of operating systems. What the Semantic Web does is to create a method for abstracting and linking the internal components of this “web of data.” The essential idea, says Shadbolt, is to “give Web addresses to atomic facts.” What we have then is a set of principles for the Semantic Web that developers can then attempt to scale.

Shadbolt brings up a few conceptual problems with the Semantic Web, comparing it to dark matter; it is ‘there’ but we can’t ‘feel’ it. A major difficulty lies in the problem of co-referencing. He notes that although he and Wendy Hall often work together they do not often publish together and thus the Semantic Web as such does not recognize that they are linked. It is thus necessary, argues Shadbolt, to take a closer look at how the Semantic Web is constituted.

Shadbolt discussed the significance of URIs, which are Web-based identifiers providing information about properties, values, objects, and relations (Uniform Resource Identifier, n.d.). Shadbolt defined RDFs as a “knowledge representation language for the Web” that “represents information as sets of triples.” RDF is affiliated with W3C and has become a widely used method for modeling information through syntax formats (Resource Description Framework, n.d.)

Examples of RDF Sites
Shadbolt illustrated his discussion of Linked Data with several examples of current RDF sites. These include DBpedia, SPARQL, SameAs.org, and data.gov.uk. DBpedia is a site that extracts structured information from Wikipedia. It is unique in that it enables new mechanism for navigating, linking to and building upon Wikipedia. According to Shadbolt, DBpedia describes about 3 million pieces of data. It also is an example of triple store technology that enables browsing, navigating and semantic queries.
The UK site, data.gov.uk, is a second example of Linked Data. This site stems from a public service mandate by the UK government to provide open access to much government data, including health, education, crime, transportation and fiscal data. Shadbolt states that this site reflects themes of transparency and citizen engagement.

Opportunities and potential threats
This discussion brought up several opportunities and some potential challenges. Shadbolt stated that there is a need for further research into the “shape and structure” of networks. Nosher Contractor noted that these new forms of large, global data sets are huge opportunities for researchers. Though it may be a challenge to get access to some forms of data, publicly available data from sources like government agencies may be useful. Additionally, as the data.gov.uk site exemplifies, this form of data can act as both a public service and as a means to keep governments accountable by making data accessible and understandable. Arguing that data empowers, Shadbolt used the example of the UK government’s decision to make bike accident data available and the resulting production of accident-avoidance Web applications in under 24 hours. He proposed that similar linked data efforts in Haiti could aid in the coordination of relief efforts. URI, according to Shadbolt, frees data in a way that being “locked up inside spreadsheets or large databases” does not. It is Shadbolt’s conviction that governments “should establish the principle that all public services should publish in reusable form all objective data.”

However, the Semantic Web also brings up issues including privacy and data literacy. Shadbolt noted that although some people may feel comfortable with private firms, for example,  Google managing health records, governments have “a rule and responsibility to the people.” He argues, it is time for the invocation of data portability and transparency. Shadbolt pointed out that the Obama administration has not adopted full data portability, and that if a person visits data.gov.uk they are faced with large downloadable files that may or may not be useful. He is anticipating the creation of semantic.data.gov.

Shadbolt noted also that some governments think that raw data can be too dangerous, and that some data should not be authorized for circulation because people are not data literate and cannot interpret it correctly. He then asked how this is different from the data literacy problems we witness in print media. In terms of data literacy, a seminar participant argued that this form of literacy was necessary to enable people to understand these newly accessible forms of data and metadata. In terms of privacy, one seminar participant asked, what mechanisms exist to balance audience rights with the availability of information? She gave the example of the sex offender database in the U.S., which names offenders and has been controversial for taking away individuals’ privacy without providing enough context about the seriousness of past crimes. Hall responded that the Semantic Web is akin to the World Wide Web of 1994 — it is new, and the rules are still being established. So far, there is no such privacy mechanism yet. Shadbolt also touched upon the issue of granularity in relation to privacy. If Semantic Web networks scale down to the level of the individual level, this further touches upon the issue of privacy.

Additional References