Scanning 10 Billion Files in 45 Minutes
Data scientists are currently hard at work dealing with today’s unmanageable volumes of data in fields such as particle physics, human resources, automation, weather prediction, law, cancer research, and more. Humans in these and other professions are simply underwhelming when trying to interpret huge amounts of mind-numbing data. The fact that IBM Watson can read and interpret 500 gigabytes of data per second, the equivalent of a million books, or 800 million pages, is telling. What’s more, IBM Watson can “read” 10 billion files in 45 minutes!
The current strategy for data scientists is to use deep machine learning and artificial intelligence to create new sets of algorithms that model highly abstract data. This is accomplished by using multiple non-linear processing layers that yield new representations of data. This is known as Massively Parallel Processing (MPP), in which different parts of a software program can run simultaneously and separately to yield new data insights. There is no end in sight for the development of new and more intelligent computational tools to derive new understanding from the analysis of massive data sets.
LHC Data Produced at a Rate of 1 Petabyte Per Second
One of the largest creators of new data is the Large Hadron Collider (LHC) at CERN near Geneva, Switzerland, which creates 3 gigabytes of data (GB) per second or about 25 petabytes (25 million gigabytes) per year. Not surprisingly CERN has created a huge worldwide computing grid of 170 data centers in 42 countries with more than 530 petabytes of storage.
Following is a map of the LHC computing grid:
Running Simulations with LHC Data Deluge
A key advantage of simulations is the ability to run scenarios and test theories while the LHC is turned off. In fact, the Higgs Boson was uncovered when CERN scientists used data from LHC Run #1.
With the growing amount of LHC data, scientists are using simulations to help uncover new discoveries and answer questions such as:
- What is the role of supersymmetric particles?
- What is the actual nature of dark matter?
- Are there parallel universes?