Open Data Driven Science: A Boon Or A Bane?

December 15, 2018 by Promit Ray

Harnessing former data for scientific progress dates way back to the 17th century; Kepler used data formulated by Tycho Brahe to formulate the planetary laws of motion. Often cited as a first classical example of successful scientific collaboration,  we continue to follow in Kepler’s footsteps today hundreds of years later. Modern science continues to progress, much like the dwarf riding on the shoulders of the gigantic mass of data now available. The previous decade has witnessed an unprecedented development of data science, machine learning, and artificial intelligence.

Research data being universally available has given credence to Einstein’s famous quote “never stop asking questions”. Shared data, publicly available, can now be used to answer the questions scientists are being encouraged to ask. This, however, is a recent phenomenon. The earlier ‘publish or perish’ mentality dominating scientific publication did not allow data which was not publishable to be shared. Thus data which did not immediately appear relevant was not made available for others to use, dwell on and make relevant. This trend thankfully is fast changing. The digital universe is now expanding by 40 percent every year. Advances in research have led to an exponential increase in scientific data, ranging from astronomical data to material properties, being produced every day. Starting with the  CERN, several research organizations have joined hands in mining the available data. A pertinent example is the development of novel materials via the ‘NOMAD’ (Novel Materials Discovery) and the ‘Materials’ project. The protein data and gene banks have triggered a multitude of research projects all over the world. Such collaborative efforts help prevent repetitions and mistakes, and also, foster mutual intellectual respect among scientists.

Making data widely available, however, comes with its own challenges. There is a constant need to filter, check and update databases, to make sure that accurate data is easily available for a non-specialist. Presented data will need to be transparent and accurate. Stringent guidelines in producing and sharing information could become both a necessity and an advantage in the long run, two birds with one stone. In the words of Karl Popper, “good tests kill flawed theories, we remain alive to guess again”. A truly data-driven scientific society will facilitate an objective evaluation of ideas, constantly pushing innovation to its limits. Researchers from all over the world will have a chance to verify and access the reliability of deposited data. Data which appears irrelevant today can trigger a major scientific breakthrough later. We open ourselves to these possibilities by removing paywalls and boundaries. Different patterns in data can often be extracted when viewed from different perspectives. Varying interpretations over a wide variety of datasets can lead to endless possibilities.  As the vampire diaries say, ‘there is no such thing as bad ideas, just poorly executed awesome ideas’.

Post written by Promit Ray

Promit Ray is a passionate chemistry graduate with a love for scientific writing. He is passionate about observing and learning from patterns in data and greatly enjoys explaining complicated science simply. He is currently at the end of his PhD in computational chemistry at the University of Bonn, Germany following a masters degree from Jawaharlal Nehru Centre for Advanced Scientific Research, India. A materials scientist by training, he writes on demystifying scientific concepts, careers, and life in the sciences, and generally aims to involve the community in popular science. Always happy to chat about science communication and new projects, he can be reached at


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: