Big Data Algorithms, Languages Expand
The buzz around big data is spawning new algorithms, programming languages, and techniques at the speed of software.
Source: Rick Merritt
“Neural networks have been around for a long time. What’s new is the large amounts of data we have to run against them and the intensity of engineering around them,” said Inderpal Bhandari, a veteran computer scientist who was named IBM’s first chief data officer.
He described work using generative adversarial networks to pit two neural nets against each other to create a better one. “This is an engineering idea that leads to more algorithms — there is a lot of that kind of engineering around neural networks now.”
In some ways, the algorithms are anticipating tomorrow’s hardware. For example, quantum algorithms are becoming hot because they “allow you to do some of what quantum computers would do if they were available, and these algorithms are coming of age,” said Anthony Scriffignano, chief data scientist for Dun & Bradstreet.
Deep belief networks are another hot emerging approach. Scriffignano describes it as “a non-regressive way to modify your goals and objectives while you are still learning — as such, it has characteristics of tomorrow’s neuromorphic computers,” systems geared to mimic the human brain.
At Stanford, the DeepDive algorithms developed by Chris Ré have been getting traction. They help computers understand and use unstructured data like text, tables, and charts as easily as relational databases or spreadsheets, said Stephen Eglash, who heads the university’s data science initiative.
“Much of existing data is un- or semi-structured. For example, we can read a datasheet with ease, but it’s hard for a computer to make sense of it.”
So far, Deep Dive has helped oncologists use computers to interpret photos of tumors. It’s being used by the New York attorney general as a law enforcement tool. It’s also in use across a large number of companies working in different domains.
DeepDive is unique in part because “it IDs and labels everything and then uses learning engines and probabilistic techniques to figure out what they mean,” said Eglash.
While successful, the approach is just one of many algorithm efforts in academia these days. Others focus on areas such as computer vision or try to ID anomalies in real-time data streams. “We could go on and on,” said Eglash.
Some of the adrenaline rush for developers comes from getting their hands on interesting, real-world data sets. Darren Haas, head of Predix cloud engineering at GE Digital, claims that he has an edge here.
“I don’t think some people have ever had an opportunity to look at some of the data we have coming through,” said Haas. “I have petabytes of sensor data from airplanes, satellites and trains. If I correlate all three of these, I can tell you a lot of things, like whether farms or sequoia trees are looking healthy or sick.”
Haas joined GE from Apple, where he worked on one of the largest deployments of Hadoop for running jobs like Siri. He says that he has been able to attract to GE talented programmers from the likes of Amazon, Facebook, and Google. “When I show them what data sets we work on, they get stoked,” he said.
Indeed, GE’s software group in suburban San Ramon, California, has ballooned from 800 programmers a couple of years ago to about 2,000 today.
One of Haas’ personal passions is Go. It is one of a handful of compiled programming languages such as Elixir and Erlang now getting renewed attention from developers working on machine learning.
“In the implementation layer, I push my team to use Go,” said Haas, noting that GE regularly hosts Go meetups. “It’s compiled, it’s fast and runs everywhere, and memory management is better. I’m teaching my 12-year-old son Go. I think it’s the future.”
The compiled languages like Go are good for use with runtime environments.