Open data can also be linked data - referred to as linked open data.
One of the most important forms of open data is open government data (OGD), which is a form of open data created by ruling government institutions. Open government data's importance is born from it being a part of citizens' everyday lives, down to the most routine/mundane tasks that are seemingly far removed from government.
The abbreviation FAIR/O data is sometimes used to indicate that the dataset or database in question complies with the principles of FAIR data and also carries an explicit data‑capable open license.
The concept of open data is not new, but a formalized definition is relatively new. Conceptually, open data as a phenomenon denotes that governmental data should be available to anyone with a possibility of redistribution in any form without any copyright restriction. One more definition is the Open Definition which can be summarized as "a piece of data is open if anyone is free to use, reuse, and redistribute it – subject only, at most, to the requirement to attribute and/or share-alike." Other definitions, including the Open Data Institute's "open data is data that anyone can access, use or share," have an accessible short version of the definition but refer to the formal definition. Open data may include non-textual material such as maps, genomes, connectomes, chemical compounds, mathematical and scientific formulae, medical data, and practice, bioscience and biodiversity.
A major barrier to the open data movement is the commercial value of data. Access to, or re-use of, data is often controlled by public or private organizations. Control may be through access restrictions, licenses, copyright, patents and charges for access or re-use. Advocates of open data argue that these restrictions detract from the common good and that data should be available without restrictions or fees. In addition, it is important that data are re-usable without requiring further permission, though the types of re-use (such as the creation of derivative works) may be controlled by a license.
A typical depiction of the need for open data:
Numerous scientists have pointed out the irony that right at the historical moment when we have the technologies to permit worldwide availability and distributed process of scientific data, broadening collaboration and accelerating the pace and depth of discovery... we are busy locking up that data and preventing the use of correspondingly advanced technologies on knowledge.
Creators of data often do not consider the need to state the conditions of ownership, licensing and re-use; instead presuming that not asserting copyright enters the data into the public domain. For example, many scientists do not consider the data published with their work to be theirs to control and consider the act of publication in a journal to be an implicit release of data into the commons. However, the lack of a license makes it difficult to determine the status of a data set and may restrict the use of data offered in an "Open" spirit. Because of this uncertainty it is also possible for public or private organizations to aggregate said data, claim that it is protected by copyright, and then resell it.
The issue of indigenous knowledge (IK) poses a great challenge in terms of capturing, storage and distribution. Many societies in third-world countries lack the technicality processes of managing the IK.
At his presentation at the XML 2005 conference, Connolly displayed these two quotations regarding open data:
While the open-science-data movement long predates the Internet, the availability of fast, ubiquitous networking has significantly changed the context of Open science data, since publishing or obtaining data has become much less expensive and time-consuming.
The Human Genome Project was a major initiative that exemplified the power of open data. It was built upon the so-called Bermuda Principles, stipulating that: "All human genomic sequence information … should be freely available and in the public domain in order to encourage research and development and to maximize its benefit to society'. More recent initiatives such as the Structural Genomics Consortium have illustrated that the open data approach can also be used productively within the context of industrial R&D.
In 2004, the Science Ministers of all nations of the Organisation for Economic Co-operation and Development (OECD), which includes most developed countries of the world, signed a declaration which essentially states that all publicly funded archive data should be made publicly available. Following a request and an intense discussion with data-producing institutions in member states, the OECD published in 2007 the OECD Principles and Guidelines for Access to Research Data from Public Funding as a soft-law recommendation.
Examples of open data in science:
The Dataverse Network Project – archival repository software promoting data sharing, persistent data citation, and reproducible research
data.uni-muenster.de – Open data about scientific artifacts from the University of Muenster, Germany. Launched in 2011.
linkedscience.org/data – Open scientific datasets encoded as Linked Data. Launched in 2011, ended 2018.
systemanaturae.org – Open scientific datasets related to wildlife classified by animal species. Launched in 2015.
There are a range of different arguments for government open data. For example, some advocates contend that making government information available to the public as machine readable open data can facilitate government transparency, accountability and public participation. "Open data can be a powerful force for public accountability—it can make existing information easier to analyze, process, and combine than ever before, allowing a new level of public scrutiny." Governments that enable public viewing of data can help citizens engage within the governmental sectors and "add value to that data." Open data experts have, however, nuanced the impact that opening government data may have on government transparency and accountability. In a widely cited paper, scholars David Robinson and Harlan Yu contend that governments may project a veneer of transparency by publishing machine-readable data that does not actually make government more transparent or accountable (e.g. weather, bus schedules). Drawing from earlier studies on transparency and anticorruption, World Bank political scientist Tiago C. Peixoto extended Yu and Robinson’s argument by highlighting a minimal chain of events necessary for open data to lead to accountability: i) relevant data is disclosed, ii) the data is widely disseminated and understood by the public, iii) the public reacts to the content of the data, and iv) public officials either respond to the public’s reaction, or are sanctioned by the public through institutional means (e.g., elections, recall).
Some make the case that opening up official information can support technological innovation and economic growth by enabling third parties to develop new kinds of digital applications and services.
Several national governments have created websites to distribute a portion of the data they collect. It is a concept for a collaborative project in the municipal Government to create and organize culture for Open Data or Open government data.
At the international level, the United Nations has an open data website that publishes statistical data from member states and UN agencies, and the World Bank published a range of statistical data relating to developing countries. The European Commission has created two portals for the European Union: the EU Open Data Portal which gives access to open data from the EU institutions, agencies and other bodies and the European Data Portal that provides datasets from local, regional and national public bodies across Europe. The two portals were consolidated to data.europa.eu on April 21, 2021.
At a micro level, a business or research organization's policies and strategies towards open data will vary, sometimes greatly. However, one common strategy employed is the use of a data commons. A data commons is an interoperable software and hardware platform that aggregates (or collocates) data, data infrastructure, and data-producing and -managing applications in order to better allow a community of users to manage, analyze, and share their data with others over both short- and long-term timelines. Ideally, this interoperable cyberinfrastructure should be robust enough "to facilitate transitions between stages in the life cycle of a collection" of data and information resources while still being driven by common data models and workspace tools enabling and supporting robust data analysis. The policies and strategies underlying a data commons will also ideally involve numerous stakeholders, including the data commons service provider, data contributors, and data users.
Grossman et al. suggest six major considerations for a data commons strategy that better enables open data in businesses and research organizations. Such a strategy should address the need for:
permanent, persistent digital IDs, which enable access controls for datasets;
permanent, discoverable metadata associated with each digital ID;
data "peering," without access, egress, and ingress charges; and
a rationed approach to users computing data over the data commons.
Beyond individual businesses and research centers, and at a more macro level, countries like Germany have launched their own official nationwide open data strategies, detailing how data management systems and data commons should be developed, used, and maintained for the greater public good.
The debate on open data is still evolving. The best open government applications seek to empower citizens, to help small businesses, or to create value in some other positive, constructive way. Opening government data is only a way-point on the road to improving education, improving government, and building tools to solve other real world problems. While many arguments have been made categorically, the following discussion of arguments for and against open data highlights that these arguments often depend highly on the type of data and its potential uses.
Arguments made on behalf of open data include the following:
In scientific research, the rate of discovery is accelerated by better access to data.
Making data open helps combat "data rot" and ensure that scientific research data are preserved over time.
Statistical literacy benefits from open data. Instructors can use locally relevant data sets to teach statistical concepts to their students.
It is generally held that factual data cannot be copyrighted. However, publishers frequently add copyright statements (often forbidding re-use) to scientific data accompanying publications. It may be unclear whether the factual data embedded in full text are part of the copyright.
While the human abstraction of facts from paper publications is normally accepted as legal there is often an implied restriction on the machine extraction by robots.
Unlike open access, where groups of publishers have stated their concerns, open data is normally challenged by individual institutions. Their arguments have been discussed less in public discourse and there are fewer quotes to rely on at this time.
Arguments against making all data available as open data include the following:
Government funding may not be used to duplicate or challenge the activities of the private sector (e.g. PubChem).
Governments have to be accountable for the efficient use of taxpayer's money: If public funds are used to aggregate the data and if the data will bring commercial (private) benefits to only a small number of users, the users should reimburse governments for the cost of providing the data.
Open data may lead to exploitation of, and rapid publication of results based on, data pertaining to developing countries by rich and well-equipped research institutes, without any further involvement and/or benefit to local communities (helicopter research); similarly to the historical open access to tropical forests that has led to the disappropriation ("Global Pillage") of plant genetic resources from developing countries.
The revenue earned by publishing data can be used to cover the costs of generating and/or disseminating the data, so that the dissemination can continue indefinitely.
The revenue earned by publishing data permits non-profit organisations to fund other activities (e.g. learned society publishing supports the society).
The government gives specific legitimacy for certain organisations to recover costs (NIST in US, Ordnance Survey in UK).
Privacy concerns may require that access to data is limited to specific users or to sub-sets of the data.
Collecting, 'cleaning', managing and disseminating data are typically labour- and/or cost-intensive processes – whoever provides these services should receive fair remuneration for providing those services.
Sponsors do not get full value unless their data is used appropriately – sometimes this requires quality management, dissemination and branding efforts that can best be achieved by charging fees to users.
Often, targeted end-users cannot use the data without additional processing (analysis, apps etc.) – if anyone has access to the data, none may have an incentive to invest in the processing required to make data useful (typical examples include biological, medical, and environmental data).
There is no control to the secondary use (aggregation) of open data.
Relation to other open activities
The goals of the Open Data movement are similar to those of other "Open" movements.
Open access is concerned with making scholarly publications freely available on the internet. In some cases, these articles include open datasets as well.
Open specifications are documents describing file types or protocols, where the documents are openly licensed. Usually these specifications are primarily meant to improve different software handling the same file types or protocols, but monopolists forced by law into open specifications might make it more difficult.
Open content is concerned with making resources aimed at a human audience (such as prose, photos, or videos) freely available.
Open knowledge. Open Knowledge International argues for openness in a range of issues including, but not limited to, those of open data. It covers (a) scientific, historical, geographic or otherwise (b) Content such as music, films, books (c) Government and other administrative information. Open data is included within the scope of the Open Knowledge Definition, which is alluded to in Science Commons' Protocol for Implementing Open Access Data.
Open notebook science refers to the application of the Open Data concept to as much of the scientific process as possible, including failed experiments and raw experimental data.
Open-GLAM (Galleries, Library, Archives, and Museums) is an initiative and network that supports exchange and collaboration between cultural institutions that support open access to their digitised collections. The GLAM-Wiki Initiative helps cultural institutions share their openly licensed resources with the world through collaborative projects with experienced Wikipedia editors. Open Heritage Data is associated with Open GLAM, as openly licensed data in the heritage sector is now frequently used in research, publishing, and programming, particularly in the Digital Humanities.
Open Data as commons
Ideas and definitions
Formally both the definition of Open Data and commons revolve around the concept of shared resources with a low barrier to access.
Substantially, digital commons include Open Data in that it includes resources maintained online, such as data.
Overall, looking at operational principles of Open Data one could see the overlap between Open Data and (digital) commons in practice. Principles of Open Data are sometimes distinct depending on the type of data under scrutiny. Nonetheless, they are somewhat overlapping and their key rationale is the lack of barriers to the re-use of data(sets). Regardless of their origin, principles across types of Open Data hint at the key elements of the definition of commons. These are, for instance, accessibility, re-use, findability, non-proprietarily.
Additionally, although to a lower extent, threats and opportunities associated with both Open Data and commons are similar. Synthesizing, they revolve around (risks and) benefits associated with (uncontrolled) use of common resources by a large variety of actors.
Both commons and Open Data can be defined by the features of the resources that fit under these concepts, but they can also be defined by the characteristics of the systems their advocates push for. Governance is a focus for both Open Data and commons scholars. The key elements that outline commons and Open Data peculiarities are the differences (and maybe opposition) to the dominant market logics as shaped by capitalism. Perhaps it is this feature that emerges in the recent surge of the concept of commons as related to a more social look at digital technologies in the specific forms of digital and, especially, data commons.
This project aimed at extrapolating and identifying online social relations surrounding “collaboration” in Bologna. Data was collected from social networks and online platforms for citizens collaboration. Eventually data was analyzed for the content, meaning, location, timeframe, and other variables. Overall, online social relations for collaboration were analyzed based on network theory. The resulting dataset have been made available online as Open Data (aggregated and anonymized); nonetheless, individuals can reclaim all their data. Interestingly this has been done with the idea of making data into a commons.
This project exemplifies the relationship between Open Data and commons, and how they can disrupt the market logic driving big data use in two ways. First, it shows how such projects, following the rationale of Open Data somewhat can trigger the creation of effective data commons. The project itself was offering different types of support to social network platform users to have contents removed. Second, opening data regarding online social networks interactions has the potential to significantly reduce the monopolistic power of social network platforms on those data.
to deposit bioinformatics, atomic and molecular coordinate data, experimental data into the appropriate public database immediately upon publication of research results.
to retain original data sets for a minimum of five years after the grant. This applies to all data, whether published or not.
Other bodies active in promoting the deposition of data as well as full text include the Wellcome Trust. An academic paper published in 2013 advocated that Horizon 2020 (the science funding mechanism of the EU) should mandate that funded projects hand in their databases as "deliverables" at the end of the project, so that they can be checked for third party usability then shared.
Several mechanisms restrict access to or reuse of data (and several reasons for doing this are given above). They include:
making data available for a charge.
compilation in databases or websites to which only registered members or customers can have access.
use of a proprietary or closed technology or encryption which creates a barrier for access.
copyright statements claiming to forbid (or obfuscating) re-use of the data, including the use of "no derivatives" requirements.
patent forbidding re-use of the data (for example the 3-dimensional coordinates of some experimental protein structures have been patented).
restriction of robots to websites, with preference to certain search engines.
^Kassen, Maxat (1 October 2013). "A promising phenomenon of open data: A case study of the Chicago open data project". Government Information Quarterly. 30 (4): 508–513. doi:10.1016/j.giq.2013.05.012. ISSN0740-624X.
^Yu, Harlan; Robinson, David G. (28 February 2012). "The New Ambiguity of 'Open Government'". UCLA Law Review Discourse. 59. doi:10.2139/ssrn.2012489. SSRN2012489 – via Social Science Research Network.
^Robinson, David G.; Yu, Harlan; Zeller, William P.; Felten, Edward W. (1 January 2009). "Government Data and the Invisible Hand". Yale Journal of Law & Technology. Rochester, NY. 11. SSRN1138083 – via Social Science Research Network.
^Vaka, Avinash; Manasa, G.; Sameer, G.; Das, Bhaskarjyoti (July 2019). "Generation And Analysis Of Trust Networks". 2019 1st International Conference on Advances in Information Technology (ICAIT). 2019 1st International Conference on Advances in Information Technology (ICAIT). pp. 443–448. doi:10.1109/ICAIT47043.2019.8987287. ISBN978-1-7281-3241-9.
^Lauterbach, Debra; Truong, Hung; Shah, Tanuj; Adamic, Lada (August 2009). "Surfing a Web of Trust: Reputation and Reciprocity on CouchSurfing.com". 2009 International Conference on Computational Science and Engineering. 4: 346–353. doi:10.1109/CSE.2009.345. ISBN978-1-4244-5334-4. S2CID12869279.
^Rustam Tagiew; Dmitry I. Ignatov; Radhakrishnan Delhibabu (2015). "Hospitality Exchange Services as a Source of Spatial and Social Data?". 2015 IEEE International Conference on Data Mining Workshop (ICDMW). (IEEE) International Conference on Data Mining Workshop (ICDMW). Atlantic City. pp. 1125–1130. doi:10.1109/ICDMW.2015.239. ISBN978-1-4673-8493-3.
^Rustam Tagiew; Dmitry I. Ignatov; Radhakrishnan Delhibabu (2015). "Economics of Internet-Based Hospitality Exchange". 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT). (IEEE/WIC/ACM) International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT). Singapore. pp. 493–498. arXiv:1501.06941. doi:10.1109/WI-IAT.2015.89. ISBN978-1-4673-9618-9.