Study shows widely used machine learning methods don't work as claimed
Source: University of California
Credit: CC0 Public Domain
Models and algorithms for analyzing complex networks are widely used in research and affect society at large through their applications in online social networks, search engines, and recommender systems. According to a new study, however, one widely used algorithmic approach for modeling these networks is fundamentally flawed, failing to capture important properties of real-world complex networks.
"It's not that these techniques are giving you absolute garbage. They probably have some information in them, but not as much information as many people believe," said C. "Sesh" Seshadhri, associate professor of computer science and engineering in the Baskin School of Engineering at UC Santa Cruz.
Seshadhri is first author of a paper on the new findings published March 2 in Proceedings of the National Academy of Sciences. The study evaluated techniques known as "low-dimensional embeddings," which are commonly used as input to machine learning models. This is an active area of research, with new embedding methods being developed at a rapid pace. But Seshadhri and his coauthors say all these methods share the same shortcomings.
To explain why, Seshadhri used the example of a social network, a familiar type of complex network. Many companies apply machine learning to social network data to generate predictions about people's behavior, recommendations for users, and so on. Embedding techniques essentially convert a person's position in a social network into a set of coordinates for a point in a geometric space, yielding a list of numbers for each person that can be plugged into an algorithm.
"That's important because something abstract like a persons 'position in a social network' can be converted to a concrete list of numbers. Another important thing is that you want to convert this into a low-dimensional space, so that the list of numbers representing each person is relatively small," Seshadhri explained.
Once this conversion has been done, the system ignores the actual social network and makes predictions based on the relationships between points in space. For example, if a lot of people close to you in that space are buying a particular product, the system might predict that you are likely to buy the same product.
Seshadhri and his coauthors demonstrated mathematically that significant structural aspects of complex networks are lost in this embedding process. They also confirmed this result by empirically by testing various embedding techniques on different kinds of complex networks.