Sharing data without sharing sensitive information
By | May 3rd, 2016 | Data Sharing


Every conversation I have about data sharing leads to a single point: the difficult decision about how and when to send information outside of a business’ systems.

There are plenty of good reasons why this is consistently brought up, including concerns about data being used for unintended purposes and fear of subpar security standards among recipients.

However, for every standard question there is an equally standard answer. Mine usually goes something like this:

“We at Data Sharing International, use best in class encryption both at rest and in-motion, all our systems are penetration-tested and hardened beyond the latest standards and, besides, we’ve been doing this for 30 years and you can trust us.”

While all of these concerns are valid, the fact is no matter what security measures you take, you can’t always control being the victim of a zero day exploit or a coworker’s silly mistake. Look at Experian (where I used to work) – a company that pours substantial resources into security and is entrusted, with good reason, with large volumes of very sensitive data. Late last year a hacker gained access to one of their systems and was able to steal personal information about millions of consumers.

If it can happen to Experian, it can happen to any company.

By allowing your sensitive data to sit on another company’s systems, you are making the trade-off that the value you will gain by sharing this information will outweigh the risk of something bad happening. This sucks for a couple of reasons.

First, not unlike disaster planning, it’s extremely difficulty to accurately estimate either the cost or the likelihood that a breach is going to happen. This uncertainty means data sharing use cases are often dead on arrival, especially if you cannot easily and clearly point at the potential for a giant return. This leads to people both over-hyping the value of any particular data sharing project and asking for every possible piece of data they can so as not to miss out on such a rare opportunity.

Given these challenges, why would anyone ever try to start a new data sharing company? One of the central things we are trying to do at XOR is to fix the problems we see with the data sharing infrastructure through new technologies. To this effect, we have developed a technology we call distributed data exchanges. This technology allows organizations to share data without any of the most sensitive information leaving their systems.

The way this works is the organization runs a piece of software on its internal system. This software provides a front end that authorized users can send information to match against. When it receives a request, the distributed exchange software will look to see if any of the relevant request information matches and returns the information needed to fulfill the request.

So far, this sounds a lot like any regular API call; but what makes this interesting is how these requests are routed and combined when there are many participants. In a distributed exchange all requests are made through XOR who then sends out requests to all the different participants. The participants all return their individual results to XOR which are then combined in an intelligent way and returned to the requestor. After the results have been returned, XOR removes all the individual information collected from the participants.

There are a bunch of interesting consequences of this approach:

  • Neither the requestor nor the participant needs to know who the other person is, allowing some degree of anonymity.
  • A participant’s data is not retained anywhere except their own systems.
  • If someone decides they no longer want to share their data, they can just turn the software off.
  • A participant will know exactly when and what data is being accessed, and can control what they want it to be used for (assuming others don’t lie about their usage).
  • Risk of a mass compromise is drastically reduced as there is no central location where all data is stored.

There are also some tricky complications this adds:

  • Data sharing is no longer fire-and-forget. The participants need to maintain the software that allows the sharing.
  • Response times may be slower. In these schemes, many outside calls are needed to gather data.
  • Portions of the data may become unavailable as individual participants’ servers do not respond to requests.

Overall I believe the advantages of a distributed exchange outweigh the challenges for most use cases; and we will be launching our first product that utilizes this new technology next week. If all of this is a bit abstract for you, then stay tuned as you will be able to see a very concrete example of all of this in action very soon.

Greg Bonin, vice president of exchange solutions, brings to XOR more than 9 years of experience extracting value and insights from data by leveraging the fusion of technology and analytics. He has expertise in a wide variety of areas including consumer and business credit risk, fraud, marketing, healthcare and insurance. Greg was previously a founding member of the Data Lab, a research and development group within Experian, where he led a team of scientists.

Leave a Reply