
Digital transformation has given enterprises greater access to consumer data than ever before. For example, businesses encouraging opt-ins to loyalty programs and newsletter lists — and excellent database management — can build a 360-degree view of how their customers engage with their brands. They can use insights from that data to guide product enhancements, personalize service, and, overall, create better customer experiences.
Collecting consumer data sounds like a win-win, but there is a controversy surrounding it: Who does personally identifiable information (PII) belong to?
The General Data Protection Regulation (GDPR) made it clear that in the EU, PII belongs to the consumer. GDPR empowers consumers to control how their data is used, and, if they choose, to have data erased, giving them “the right to be forgotten.” Great for consumers, but a potential nightmare for database administrators.
How to Comply with the Right to Be Forgotten
At Postgres Vision 2020, Dr. Michael Stonebraker, MIT Professor and the original builder of Postgres, shared his insights about how to tackle the issue of consumers’ right to be forgotten.
“In my opinion, the easiest way to deal with the right to be forgotten is to view this as a database design problem,” Stonebraker says. PII could be anywhere in an enterprise, and finding it and deleting it can be a challenge. “And the minute I add anything, I change where you have to go to look to delete stuff. So, if you just force a clean entity relationship (ER) schema, you can make this problem wildly easier.”
He suggests using an ER design tool to construct a diagram of your data and automatically map to a set of 3NF tables in the database. For example, a company’s system could have an employee entity that contains data such as employee names, salaries, and ages, a department entity that contains department names and floor of the department, and a “works_in” relationship that can show an employee can work in one or more departments. That data is mapped into corresponding tables.
For the right to be forgotten, apply surrogate keys for the entities, disallow user access to surrogate keys and disallow materialized views or copies. This allows you to delete an entity by deleting him or her from the entity table. It gets rid of all personal data — it’s now unreachable because mapping to surrogate keys is gone.
“As long as you have a clean schema, deleting PII data is straightforward,” he says.
Database Management Hurdles on the Way to Compliance
Stonebraker points out, however, “Real-world DBAs often construct lousy schemas for performance reasons.” They may want to make queries go faster, but it can make the right to be forgotten difficult. He comments that clean schemas are always a good idea, and complying with the right to be forgotten portion of GDPR or other consumer data protection regulations could be a way to force them on your users.
He adds that when applications, and subsequently schemas, must change, you could lose a clean schema in an attempt to make application maintenance easier. Again, the right to be forgotten is much harder to grant.
If you can delete PII from your databases, Stonebraker comments that if you’re a Postgres expert, you know that data is, however, still in the log. He says to really delete all PII, you’d have to update the log. “That’s a really dangerous thing to do. I don’t advise that at all,” he says. “You have to trust system administrators that they won’t leak the log.”
Furthermore, he points out, you probably have offsite copies for disaster recovery, so you need to determine how you will deal with deleting PII from crash recovery data onsite and offsite.
The Most Dangerous Right-to-Be-Forgotten-Related Issue
He says that those aren’t the most dangerous problems for database management regarding the right to be forgotten. Stonebraker explains that a lot of business units silo data and share it when needed to make their jobs easier. For example, one unit in charge of customer data may share PII with another in charge of supplier data, which writes information into their database. Now there is a new copy of PII.
“It’s a typical tactic in the legacy, data-siloed world, so copies of data are all over the enterprise,” he says.
It’s probably not practical to disallow applications from reading PII data, so you have to log whenever they read or write data. This requires sandboxing the application. If you need to delete one person’s PII, find everywhere it went.
“It’s trickier than you think,” he says. You need to consider transformations, such as John Smith and J. Smith. A user could write data into a lookup table, then copy data from it. Also, if data can be written to the screen, there’s nothing to prevent a user from writing it down and sending it to someone via email. “At some level, you have to trust users, or you can’t let them access data,” he says.
This Isn’t Just a GDPR Problem
Since the EU began enforcing GDPR in 2018, regulated enterprises have been searching for the most efficient and effective ways to comply. Moreover, legislators are enacting new regulations, such as the California Consumer Privacy Act (CCPA), that grant people the right to erase their PII, making this a more widespread database management challenge.
If you or your clients are not currently impacted by these laws, you likely will be in the future. The debate over who owns PII is repeatedly coming down on the side of the consumer. If your application uses consumer data, it’s time to determine how you will adapt database management to comply.