How Vera Rubin Telescope Scientists Will Deal With 60 Million Billion Bytes of Imagery

By: New York Times Science Posted On: June 21, 2025 View: 31

The Vera C. Rubin Observatory will make the study of stars and galaxies more like the big data-sorting exercises of contemporary genetics and particle physics.

It was not that long ago that astronomers would spend a night looking through a telescope, making careful observations of one or a few points of light.

Based on those few observations, they would extrapolate broad generalizations about the universe.

“It was all people could really do at the time, because it was hard to collect data,” said Leanne Guy, the data management scientist at the new Vera C. Rubin Observatory.

Rubin, located in Chile and financed by the U.S. Department of Energy and the National Science Foundation, will inundate astronomers with data.

Each image taken by Rubin’s camera consists of 3.2 billion pixels that may contain previously undiscovered asteroids, dwarf planets, supernovas and galaxies. And each pixel records one of 65,536 shades of gray. That’s 6.4 billion bytes of information in just one picture. Ten of those images would contain roughly as much data as all of the words that The New York Times has published in print during its 173-year history. Rubin will capture about 1,000 images each night.

As the data from each image is quickly shuffled to the observatory’s computer servers, the telescope will pivot to the next patch of sky, taking a picture every 40 seconds or so.

It will do that over and over again almost nightly for a decade.

The final tally will total about 60 million billion bytes of image data. That is a “6” followed by 16 zeros: 60,000,000,000,000,000.

Astronomy is following in the path of scientific fields like biology, which today is awash in DNA sequences, and particle physics, in which scientists must sift through torrents of debris from particle collisions to tease out hints of something new.

“We produce lots of data for everyone,” said William O’Mullane, the associate director for data management at the observatory. “So this idea of coming to the telescope and making your observation doesn’t exist, right? Your observation was made already. You just have to find it.”

Astronomers will be able to do their research anytime, anywhere, relying on high-speed networks, cloud computing and the algorithms of artificial intelligence to sift out discoveries.

All that data needs to be stored and processed.

To do that, Dr. O’Mullane oversaw the construction of a state-of-the-art data center at Rubin with enough storage to retain a month’s worth of images in case of a lengthy network disruption.

The Vera C. Rubin Observatory, under a night sky partially obscured by clouds, last month.Marcos Zegers for The New York Times

The back of the camera primary mirror of the telescope mount assembly.Marcos Zegers for The New York Times

Maintaining the nearly 60 miles of fiber-optic cables that connect the observatory to the city of La Serena, Chile, can be challenging. People have stolen equipment. A fire on the road and a truck hitting a pole have caused outages. Dr. O’Mullane said that once someone used a cable for shooting practice.

When the data is flowing, it is sent to the SLAC National Accelerator Laboratory, a Department of Energy research center in Menlo Park, Calif., for calculations that go beyond the initial analysis at the observatory.

Although Rubin will take a thousand images a night, those are not what will be sent out into the world at first. Rather, the computers at SLAC will create small snapshots of what has changed compared with what the telescope saw previously.

For each new image, obvious blemishes, like streaks from passing satellites and smudges generated by cosmic rays hitting the camera sensors, will be erased. “We try to filter out the non-astronomical garbage,” Dr. O’Mullane said.

Then the software will compare the scene with a template that combines at least three earlier observations of the same part of the sky.

The template is an image of the sky built up by from least three previous images. That is then compared with the latest image taken by Rubin telescope (labeled Science). The template is subtracted from that, leaving what’s new or changed.Vera C. Rubin Observatory

When the template is subtracted from the latest image, anything that is unchanged disappears. What is left are features that have changed. Those include exploding stars known as supernovas, variable stars that have brightened or dimmed and asteroids that are passing by.

Just one image will contain about 10,000 highlighted changes. An alert will be generated for each change — some 10 million alerts a night.

It is like an astronomical version of “Where’s Waldo?”

To classify the objects spotted outside the solar system, Rubin turns to nine outside organizations known as data brokers. These automated software systems will perform additional analysis, pull out data of interest for individual astronomers and identify intriguing events that warrant follow-up observations by other telescopes.

There are differences in each data broker’s focus and approach.

“It’s better to send that out to a global community of scientists with a lot of different skills and expertise to bring in their knowledge,” Dr. Guy said.

A data broker named Antares, created by the National Science Foundation’s National Optical-Infrared Astronomy Research Laboratory, or NOIRLab, will run the alerts through 20 general filters to pull out changes of wide interest, including certain supernovas.

Its analysis is flexible. Astronomers will be able to write their own filters to find just the events they want to study.

“We add contextual information from existing astronomical catalogs,” said Tom Matheson, who leads the Antares team. “When an alert comes in, we say, ‘Do any of these catalogs have information about that object?’ And then we incorporate that into the alert so people can know more about it.”

A Chilean data broker called ALeRCE — Automatic Learning for the Rapid Classification of Events — takes what could be regarded as a simpler and broader approach: sorting all of the non-solar-system alerts into 22 buckets. Those include different flavors of supernovas as well as bursts of radiation from supermassive black holes, young stars and white dwarfs.

“They will be well curated by specialists in all areas,” said Francisco Förster, director of the Millennium Institute of Astrophysics in Chile and principal investigator of ALeRCE.

ALeRCE does not provide flexible data analysis as Antares does, but it employs several types of classification techniques.

Calibrating the camera.Marcos Zegers for The New York Times

Evening operations at the observatory.Marcos Zegers for The New York Times

Two of those techniques use classic machine-learning methods, which categorize events based on preselected criteria. (This is equivalent to defining Waldo as a cartoon human with brown hair, wearing a striped shirt, glasses and a beanie.)

Other techniques rely on neural networks and other modern deep-learning methods. These pull in raw data and independently invent their own criteria for identifying different cosmic phenomena. (Imagine a computer figuring out, on its own, that Waldo also often is seen holding a walking stick.)

Several of the data brokers including ALeRCE tested their systems using data from the Zwicky Transient Facility near San Diego, which uses a smaller telescope.

One surprise, Dr. Förster said, was that for years the supposedly more sophisticated deep-learning models failed when applied to real-time data from Zwicky. But that might have resulted from limited training data.

“Everything indicates that deep learning should win this in the future, as you get more data,” Dr. Förster said.

Michael Wood-Vasey, an astronomy professor at the University of Pittsburgh, decided to create a data broker because he thought major technology companies like Google had already solved similar challenges.

“I was like, wait a minute, we have YouTube and content distribution networks for all the major things,” Dr. Wood-Vasey said. “We shouldn’t technically try to reinvent this.”

He teamed up with Ross Thomson, who works on high-performance computing projects at Google, to use the company’s cloud computing platform.

“Lacking brand consultants or anything, we just called it the Pitt-Google broker,” Dr. Wood-Vasey said.

A couple of the other data brokers are focusing on specific slices of the data. One will compile information on asteroids and other small objects zipping through the solar system, calculating properties like the color and rotation rate. Another will track the behavior of variable stars.

Beyond the alerts, the Rubin software will combine images for more detailed analysis.

For each image, the telescope uses one of six filters, which range from ultraviolet to infrared wavelengths. The filter changes the view, much like a pair of sunglasses. Three images taken through different filters can be combined into a color picture.

Images can also be added together, essentially a longer exposure to make fainter objects visible.

The full images will be made public two years after they are taken. Until then, their use is proprietary to scientists in the United States and Chile and to other contributors to the project. Once a year, the Rubin project, with the help of additional computing power in France and the United Kingdom, will reprocess all of the images and generate a more detailed catalog of its observations. That catalog also has a two-year proprietary period.

“We’re going to take all the data taken to date and combine it to squeeze as much scientific information out of it as possible,” said Yusra AlSayyad, who oversees image processing at Rubin. “Because we’re providing calibrated images and measurements of all the sources, it’s going to grow the data by a factor of 10.”

The final data release at the end of the 10-year survey, she said, could reach 500 million billion bytes.

Read this on New York Times Science