Automatización y Ciencia

Automatización
Science is in the midst of a data crisis. Last year, there were more than 1.2 million new papers
published in the biomedical sciences alone, bringing the total number of peer-reviewed biomedical
papers to over 26 million. However, the average scientist reads only about 250 papers a
year. Meanwhile, the quality of the scientific literature has been in decline. Some recent studies found
that the majority of biomedical papers were irreproducible.
The twin challenges of too much quantity and too little quality are rooted in the finite
neurological capacity of the human mind. Scientists are deriving hypotheses from a smaller and smaller
fraction of our collective knowledge and consequently, more and more, asking the wrong questions,
or asking ones that have already been answered. Also, human creativity seems to depend increasingly
on the stochasticity of previous experiences—particular life events that allow a researcher to notice
something others do not. Although chance has always been a factor in scientific discovery, it is
currently playing a much larger role than it should.
One promising strategy to overcome the current crisis is to integrate machines and artificial
intelligence in the scientific process. Machines have greater memory and higher computational
capacity than the human brain. Automation of the scientific process could greatly increase the rate of
discovery. It could even begin another scientific revolution. That huge possibility hinges on an equally
huge question: Can scientific discovery really be automated?
I believe it can, using an approach that we have known about for centuries. The answer to this
question can be found in the work of Sir Francis Bacon, the 17th-century English philosopher and a
key progenitor of modern science.
The first reiterations of the scientific method can be traced back many centuries earlier to
Muslim thinkers such as Ibn al-Haytham, who emphasized both empiricism and experimentation.
However, it was Bacon who first formalized the scientific method and made it a subject of study. In
his book Novum Organum(1620), he proposed a model for discovery that is still known as the
Baconian method. He argued against syllogistic logic for scientific synthesis, which he considered to
be unreliable. Instead, he proposed an approach in which relevant observations about a specific
phenomenon are systematically collected, tabulated and objectively analyzed using inductive logic to
generate generalizable ideas. In his view, truth could be uncovered only when the mind is free from
incomplete (and hence false) axioms.
The Baconian method attempted to remove logical bias from the process of observation and
conceptualization, by delineating the steps of scientific synthesis and optimizing each one separately.
Bacon’s vision was to leverage a community of observers to collect vast amounts of information about
nature and tabulate it into a central record accessible to inductive analysis. In Novum Organum, he
wrote: “Empiricists are like ants; they accumulate and use. Rationalists spin webs like spiders. The
best method is that of the bee; it is somewhere in between, taking existing material and using it.”
The Baconian method is rarely used today. It proved too laborious and extravagantly expensive;
its technological applications were unclear. However, at the time the formalization of a scientific
method marked a revolutionary advance. Before it, science was metaphysical, accessible only to a few
learned men, mostly of noble birth. By rejecting the authority of the ancient Greeks and delineating
the steps of discovery, Bacon created a blueprint that would allow anyone, regardless of background,
to become a scientist.
Bacon’s insights also revealed an important hidden truth: the discovery process is inherently
algorithmic. It is the outcome of a finite number of steps that are repeated until a meaningful result is
uncovered. Bacon explicitly used the word “machine” in describing his method. His scientific
algorithm has three essential components: First, observations have to be collected and integrated into
the total corpus of knowledge. Second, the new observations are used to generate new hypotheses.
Third, the hypotheses are tested through carefully designed experiments.
If science is algorithmic, then it must have the potential for automation. This futuristic dream
has eluded information and computer scientists for decades, in large part because the three main steps
of scientific discovery occupy different planes. Observation is sensual; hypothesis-generation is
mental; and experimentation is mechanical. Automating the scientific process will require the effective
incorporation of machines in each step, and in all three feeding into each other without friction. Nobody
has yet figured out how to do that.
Experimentation has seen the most substantial recent progress. For example, the
pharmaceutical industry commonly uses automated high-throughput platforms for drug design.
Startups such as Transcriptic and Emerald Cloud Lab, both in California, are building systems to
automate almost every physical task that biomedical scientists do. Scientists can submit their
experiments online, where they are converted to code and fed into robotic platforms that carry out a
battery of biological experiments. These solutions are most relevant to disciplines that require intensive
experimentation, such as molecular biology and chemical engineering, but analogous methods can be
applied in other data-intensive fields, and even extended to theoretical disciplines.
Automated hypothesis-generation is less advanced, but the work of Don Swanson in the 1980s
provided an important step forward. He demonstrated the existence of hidden links between unrelated
ideas in the scientific literature; using a simple deductive logical framework, he could connect papers
from various fields with no citation overlap. In this way, Swanson was able to hypothesize a novel link
between dietary fish oil and Reynaud’s Syndrome without conducting any experiments or being an
expert in either field. Other, more recent approaches, such as those of Andrey Rzhetsky at the
University of Chicago and Albert-László Barabási at Northeastern University, rely on mathematical
modeling and graph theory. They incorporate large datasets, in which knowledge is projected as a
network, where nodes are concepts and links are relationships between them. Novel hypotheses would
show up as undiscovered links between nodes.
The most challenging step in the automation process is how to collect reliable scientific
observations on a large scale. There is currently no central data bank that holds humanity’s total
scientific knowledge on an observational level. Natural language-processing has advanced to the point
at which it can automatically extract not only relationships but also context from scientific papers.
However, major scientific publishers have placed severe restrictions on text-mining. More important,
the text of papers is biased towards the scientist’s interpretations (or misconceptions), and it contains
synthesized complex concepts and methodologies that are difficult to extract and quantify.
Nevertheless, recent advances in computing and networked databases make the Baconian
method practical for the first time in history. And even before scientific discovery can be automated,
embracing Bacon’s approach could prove valuable at a time when pure reductionism is reaching the
edge of its usefulness.
Human minds simply cannot reconstruct highly complex natural phenomena efficiently enough
in the age of big data. A modern Baconian method that incorporates reductionist ideas through data-
mining, but then analyses this information through inductive computational models, could transform
our understanding of the natural world. Such an approach would enable us to generate novel
hypotheses that have higher chances of turning out to be true, to test those hypotheses, and to fill gaps
in our knowledge. It would also provide a much-needed reminder of what science is supposed to be:
truth-seeking, anti-authoritarian, and limitlessly free.

Automatización y Ciencia

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Automatización y Ciencia

Diunggah oleh

Hak Cipta:

Format Tersedia

Automatización

Anda mungkin juga menyukai