UTILIZING CLIENT DATA IN THE ERA OF PRIVACY

Ornit Shinar

Director & Israel Lead, Venture Investing

Avi Arnon

Vice President, Venture Investing

Published on October 28, 2021
The opinions expressed in this blog are solely the author’s and do not reflect the views of Citi.

Digital lock icon floating above an open hand

HIGHLIGHTS

Companies must leverage data-driven strategies to compete in the digital age—however, new privacy laws restrict how client data can be gathered and used.
Several new tools and techniques have emerged to help enterprises share and analyze data while maintaining privacy—including differential privacy, multi-party computation, and synthetic data.
While all three technologies solve important data sharing problems, synthetic data holds the most promise for enabling the secure use of artificial intelligence and machine learning.

Data is an essential resource for companies that want to compete in the digital age. To stay relevant, modern enterprises must leverage data-driven strategies to acquire new customers, cater to their needs, and ensure effective communication.

The finance industry collects and uses a tremendous amount of client data for everything from real-time customer insights to accurate risk analysis and process automation. Maximizing the potential of this data requires financial institutions (FIs) to connect to and collaborate on it—however, that can be hard for incumbent FIs in particular, as that data is usually collected and held by various systems across the enterprise. This often results in data silos that restrict collaboration, cause quality issues and format inconsistencies, and most importantly prevent business leaders from getting a holistic view of the business.

This issue is compounded by increasing privacy regulation around the world. As more social and economic activities take place online and consumers grow more concerned about their privacy, laws such as Europe’s General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) are putting control of their data back in their hands. This makes data protection a key priority for organizations, as failing to meet these regulations can have severe financial repercussions.

Fortunately, several new tools, technologies, and startups are helping companies analyze and share client data while maintaining their privacy.

Challenges of Data Sharing

Historically, companies have leveraged technologies such as data anonymization—removing or encrypting sensitive information within a dataset—to protect client privacy. However, these legacy practices are insufficient today: in a recent study, researchers were able to identify 90% of individuals in a consumer credit card database from just four random pieces of information.

Data collaboration within organizations is also challenged by data breaches, which impact thousands of companies per year at an average cost of $3.86 million each. Data breach statistics show that attackers are highly motivated to access personal and financial information in order to use it for attacks from data ransom to identity theft.

It’s not surprising, therefore, that many businesses and data owners are currently limiting the data they share internally. Though this helps protect confidential information, it also keeps vast amounts of valuable data locked from business use. Even when data owners decide that the benefits of sharing data outweigh the risks, the approval process often takes months—and sometimes ends in the request being denied.

Technological Solutions and Providers

In recent years, several technology companies have tackled these challenges through solutions that help enterprises facilitate data sharing while reducing friction and mitigating data privacy risks. These solutions generally follow one of three approaches:

Differential Privacy
Differential privacy protects the information of individual clients by limiting user access and analysis to aggregated statistical data. For example, when analyzing Netflix user ratings using differential privacy, one could know that 10 million people gave the series “Stranger Things” five stars but would not be able to re-identify the score given by specific people. Differentially private solutions may also inject “noise” (i.e., random data) into a dataset or model in order to protect individual privacy.

Differential privacy holds particular promise for FIs given the amount and sensitivity of the data we collect. The technology also supports data collaboration both internally and externally, as several parties can create a joint database in order to generate insights and value across businesses and geographies without exposing their sensitive data. Startups such as Infosum, LeapYear, and Privitar offer solutions that leverage differential privacy to help enterprises aggregate and provide access to sensitive data without risking client privacy.
Multi-Party Computation
Multi-Party Computation (MPC) uses cryptography to help parties with limited trust between them share and analyze data without revealing the underlying dataset. Also called “secure computation,” it allows different companies to benefit from each other’s datasets by sending encrypted queries that preserve both the question parameters and the results. For example, Financial Crime and Compliance teams in different FIs can use MPC to securely share information while complying with privacy and financial regulation. Emerging startups including Duality Technologies and ENVEIL offer MPC solutions.
Synthetic Data
Synthetic data is a relatively new and fundamentally different approach to utilizing data while preserving privacy. Instead of changing or protecting data, synthetic data companies use machine learning (ML) to analyze real datasets and generate fake ones that mimic their characteristics and diversity. Unlike classic data anonymization techniques, which block access to all information classified as “private”—such as names and IP addresses—synthetic data allows companies to leverage full datasets, improving accuracy and usability while helping teams work more collaboratively and efficiently, knowing that the data they use has no security risk.

Synthetic data is therefore especially useful for sharing data with third parties. For example, if a company wants to test a new technology that can predict customer churn, rather than sharing real customer data, it can share synthetic data generated from that data that has no risk of re-identification. Startups such as Mostly.ai, Hazy, Datomize, Tonic, and Gretel.ai are helping companies do just that.

Overview of the data privacy landscape, highlighting key categories such as multi-party computation, synthetic data, and differential privacy, along with companies leading innovations in each area.

Source: Citi Ventures

AI-Based Models Require AI-Based Privacy

Last year, every person on Earth created an average of 1.7 megabytes of data per second. Amid this onslaught of information, businesses are increasingly using AI and ML technologies to extract insights. By extension, they will also need to implement AI-based privacy solutions that can provide them with available data at scale. Of the technologies discussed above, synthetic data stands apart for its use of ML to achieve that scale and help companies control the balance between privacy and utility—maximizing data security for external data sharing and maximizing utility for internal collaboration.

As the digital world grows increasingly complex, however, use cases will surely arise for all three technologies. With the amount of technology being developed today, the ability to share data internally, collaborate with external vendors, and evaluate new product offerings in a timely manner will be a key competitive advantage in improving a business’s products and services.

For more information email Ornit Shinar at ornit.shinar@citi.com or Avi Arnon at avi.arnon@citi.com.

For more on data and AI in the enterprise, click here.