4/26/2016

That mighty thud was CERN dropping 300TB of raw collider data to the Internet


Most of what CERN does sounds like the rarefied heights of sci-fi, accessible only to physicists with badges and pocket protectors, or academics who use esoteric software — not to mere mortals like you and me. What even happens in an atom smasher? CERN is hunting answers to big questions like what dark matter is and why there’s so much of it, why the fundamental forces seem to merge into one at extreme temperatures, why gravity behaves as it does, and other big-topic questions that seem to have one foot firmly in the realm of philosophy.

The LHC has been generating huge amounts of data for release to the general public since its first successful run in 2010. Continuing this trend, CERN just put out another big chunk of data for public analysis, some 300TB of partially organized results from the LHC’s operations since 2014 (when they did their first big data dump). CERN hopes to engage the curiosity of physicists around the world, whether amateur, academic or professional, and get them learning about particle physics and doing hands-on data analysis from their experiments. Many of the LHC’s current experiments and projects have a crowd-sourced component, relying on distributed computing like folding@home or SETI@home do. There are several experimental collider datasets on the CERN open data site that anyone can download. The LHC@home springboard page provides an overview of the distributed computing projects the LHC is currently involved in, including the LHCb, ATLAS, and ALICE.

CERN Atlas LHCThe ATLAS project is probing for fundamental particles like the Higgs Boson, as well as looking for information about dark matter and extra dimensions. It records the path, energy and identity of particles traveling through the collider and then performs offline event reconstruction based on the data banked from the ATLAS detectors. This turns the raw stream of numbers into recognizable things like photons and leptons so that they can be analyzed. (If you’re a little rusty on your quantum chromodynamics, CERN put out a PDF primer about the LHC that should help to get you up to speed.) Since the ATLAS detectors create petabytes of data during each experiment, the project needs a substantial amount of computational muscle to run reconstructions and, in their words, extract physics from the data. They specifically call out to grad students — for physics majors trying to come up with a master’s project, it might not hurt to get involved with ATLAS.

ALICE is a different ball of wax. Where the ATLAS experiment deals with colliding protons and tries to identify particles, the ALICE detector studies conditions like those found just after the Big Bang. For part of each operating year, the LHC fires lead ions instead of protons, creating conditions so extreme that protons and neutrons in the lead nuclei can “melt,” freeing constituent quarks and gluons from their mutual bond and creating a quark-gluon plasma. It is the ALICE project’s mission to explore what happens to particles under these extreme conditions, leading to insights on the nature of matter and the birth of the universe.

Prototyping is in process for the High-Luminosity upgrade to the LHC, which will create a sort of broadband production of data so that particle physics, which relies on statistics, can prod the Standard Model faster. This is a massive, profoundly collaborative project. The upgrade itself relies on innovations in several fields including magnetics, optics and superconductors. It’ll have a set of shiny quadrupole superconducting niobium-tin magnets, used to better focus the particle beam, and new high-temperature superconducting electrical lines capable of supporting currents of record intensities.

Once they fire up the HL-LHC, the volume of data produced will make 300TB look as small as the particles they’re accelerating. It’s expected to start operation about 2025, but we’ll be watching their updates the whole way.