Skip to main content
Podcast

AI Exacerbates Data Minimization Challenges

Details

March 19, 2026

As AI tools, workplace monitoring technologies, and data‑driven platforms expand, organizations face growing exposure from collecting, sharing, and retaining more data than they need for defined purposes. In this episode of We get Privacy for work, Jackson Lewis Principals Joe Lazzarotti and Damon Silver, co‑leaders of the firm’s Privacy, AI, and Cybersecurity group, break down the data minimization principle and how organizations can use it to better manage data breach, class action litigation, and regulatory risk.

Transcript

Joe Lazzarotti

Principal, Tampa

Welcome to the We Get Privacy podcast. I'm Joe Lazzarotti, and I'm joined by my co-host, Damon Silver. Damon and I co-lead the Privacy, AI, and Cybersecurity group at Jackson Lewis. In that role, we receive a variety of questions every day from our clients, all of which boil down to the core question of how we handle our data safely. In other words, how do we leverage all of the great things data can do for our organization without running headfirst into a wall of legal risk? How can we manage that risk without unnecessarily hindering our business operations?

Damon Silver

Principal, New York City

On each episode of the podcast, Joe and I talk through a common question that we're getting from our clients. We talk it through in the same way that we would with our clients, meaning with a focus on the practical. What are the legal risks? What options are available to manage those risks? What should we be mindful of from an execution perspective?

Our question for today is how we comply with the data minimization principle and reap the benefits of doing so, when at the same time we want to be using all these data-hungry AI tools like chatbots, AI meeting assistance and productivity monitoring tools.

Joe, to level set, as we get started here, could you maybe talk a little bit about what data minimization is and what some of the benefits of it are?

Lazzarotti

It's referred to in different ways. If you're familiar with HIPAA regulations, you may refer to it as the minimum necessary rule. There was some guidance issued under the CCPA, the California Consumer Privacy Act, back in 2024. They referred to just data minimization, the idea that businesses should think about at each point along the life cycle of data. What is the purpose, and are they using only what they need to use to achieve that purpose? It's a very high-level question. There's more to it than that when you read through the guidance, but the essence of it is that businesses must be using information in a way that's reasonably necessary and proportionate to what those purposes are or what are compatible with those purposes and not going beyond that.

The idea is, among other things, and there are other reasons that we can get into, that we want our data footprint to be as minimal as possible because that might facilitate our ability to respond to requests from consumers about their rights. It might minimize the impact of a data breach and provide a number of other benefits. In essence, that's what regulators look to for thinking about data minimization and the minimum necessary.

Silver

Absolutely. This has become more of a regulatory focus. The CPPA, one of the agencies that can enforce the CCPA in California, has issued enforcement advisories on this issue. It's definitely something that under the GDPR and under HIPAA, as you mentioned, has been an important concept that has received regulatory focus for a number of years now.

Even beyond that regulatory risk, if you think about a scenario where you have a data breach, the more you have abided by the data minimization principle, the less likely it is that you have unnecessary volumes of PII or PHI that trigger notification reporting obligations.

Then, looking even further downstream, if you look at the number of potential plaintiffs in a class action lawsuit, you probably are going to be in a better position to narrow down that class of people if your organization has been focused on minimization, starting with the collection of data and then going all the way through the life cycle to retention of data.

Joe, before we jump into AI, which introduces some new wrinkles, maybe we can just talk about a couple more traditional examples of common scenarios we see where an organization would benefit from being more mindful of the data minimization principle.

Just to throw one out there, I've definitely seen an employee reach out to another and say, "I need data related to this set of customers or this set of employees." The recipient of that email says, I have this spreadsheet here that has information related to the 50 people you're asking about. It also contains information about 1500 other people, but the person who made the request is authorized to have that data, or at least to access it, generally speaking. The person does what's easiest, which is just to send over the full spreadsheet. That’s a common scenario: many people don't realize they're flagrantly violating the minimum necessary principle because they're transmitting dramatically more PII than is relevant to this task.

Are there other traditional non-AI sense examples that come to mind where, intuitively, someone thinks they're doing something that's totally fine, but it poses a big issue from a data minimization perspective?

Lazzarotti

No, that's a great example. It just seems like, in your example and in a lot of examples, it's just a mindset. If you go into a small business in a strip mall, you'll often see the owner post a $5 bill with the words' congratulations,' and they'll save that bill at the beginning of their business. To me, it's not unlike going into an office space, the client will say, I have the first document we ever created for our business and everything since. Sometimes, businesses are proud of the fact that they have all of their data. This is the comfort we have from saving every piece of data. To that point, another example may be just that: a company having off-site archives of data that go back 30 or 40 years. The risk is maybe a little bit muted because that's in paper format. Nonetheless, it's just data that you're holding onto, and there's no record retention obligation to hold onto that. There's actually a cost to maintain it in that format, but they're willing to pay it just for that comfort.

To lead into maybe a more practical example with a little bit more risk is maybe legacy data or data that you acquire in connection with a transaction that you may not need from the business you acquired, and it's not an effort to go through and say, okay, what do we really need and let's get rid of everything else. It's for some of the same reasons that you said, it's just the easy thing to do right now, or we'll get to it later. It's one of those things that always seems to find its way at the bottom of the list. It's not an issue that is at the forefront of companies' minds. That's just another example that I've come across of violations of that rule.

Silver

Absolutely. Just another motive to add to the mix is that I've definitely spoken with quite a few clients who, usually, I'm speaking with people in compliance or legal, who recognize the risk of this practice but are trying to navigate the opinions and requests of various business units. Is this fear that there might be something in, say, an email box that could be useful at some point in time? What if I need to reference that communication from the negotiation with this business partner 12 years ago? I want to make sure the data isn't lost. There is a challenge with that.

As with legacy data, there's also the challenge of whether to archive or purge it; you want to first establish a process to ensure you are holding on to the stuff that's actually useful. That process is daunting. You need to do a whole data mining exercise and try to do data mining and data mapping, potentially to figure out what you have and what you might need and where it should be stored. It should be an archive; it should be part of your live system. You should set up some kind of structure to organize it. All that stuff makes sense in theory, but in practice, it takes a lot of time and money, and at the end of the day, people want what's easy. They want what's efficient, and they don't want to risk something useful to them being unavailable.

On that point, Joe, you could talk a little bit about this. I know we recently talked to some people who are in-house in high-level privacy roles about this topic. One thing they noted, which is a great point, is that beyond mitigating your risk of privacy or data security violations, you might also be missing out on opportunities to better use your data if you aren't thinking about data minimization. Because you're just going to have so much data and so little insight into it that you're probably missing out on stuff. Do you want to touch on that point a bit more?

Lazzarotti

To build a segue, Damon, to the AI piece: absolutely, the cleaner your data is, the more pristine, the more focused and targeted, the more useful it will be for various AI use cases. As a related point, there's almost an added layer of data minimization, since it's not just about deleting. It might just be redacting. It might just be saying, "maybe we don't get rid of all the data, but there's a lot of it we don't need." If you have data embedded with social security numbers, you might still want to save it. You might still have a use for it, but you could substantially minimize the risks by eliminating certain data points that would make it more sense for you to control the risk while preserving the value. In some ways, that could be the effect of what you're talking about, in terms of being more deliberate about how it is that you maintain and minimize data.

From the standpoint of your question, what could we do to the data to make it more valuable to us while at the same time minimizing it and complying with these requirements? Just from that perspective, what thoughts do you have about how that can actually happen in practice with some of the AI tools that you mentioned earlier in applying that principle?

Silver

Data minimization challenges, taking them one at a time, you're looking at the chatbots. One thing we've definitely seen is that some of these chatbots, in order to be as useful as possible, are integrated with other applications. For example, Copilot Enterprise could be integrated with Word, Excel and PowerPoint. It allows users to pull information not just from the prompt they manually enter, but also from other applications, which sounds great. Then you take a step down the road and ask, what does this user have access to in Word and Excel? Are we confident that we have put access controls in place so that when a certain employee in HR or finance asks a question that could result in data being pulled from these applications? They're only getting data we really want them to have access to, and that's relevant to whatever they're doing. Again, people, by nature, are going to do whatever is quickest and easiest. They're not going to necessarily think about, am I processing way more data through this application, through this AI tool now, than is necessary for what I'm doing? They may unintentionally get way more data than they anticipate, simply because permissions aren't as tight as they need to be.

Also, on the back end of that, you prompt an AI chatbot and get its output. Do you then turn around and email that to other people on the team and say, I just did this analysis and sorted through some of our data. This will be a useful document for us to work from. Say the document contains PII, and it's now being sent to four different people's email accounts. Two of those people then save it locally or to different places. All of a sudden, you have seven copies of the same PII. You thought, based on your previous data mapping, you only had one or two, but now you have seven. That increases your data breach footprint. It's a data minimization issue. It just starts to spiral a bit in terms of your control over your data environment.

Joe, looking at the AI meeting assistant, what issues are you seeing on the data minimization front?

Lazzarotti

One popular example, just a practical use of the tool, is that many tools are configured when you send out a meeting invite for a call that will be transcribed by an agent on that call. At the end of that call, the agent will create a transcript of everything discussed and may create a summary. Either way, there might be data, whether it's confidential business information, PII, or PHI. Now, everybody who's on that call gets a copy of it. Maybe that's okay from an access perspective. It's interesting what you were saying about people who might not realize they have access to certain resources that come through after they put the prompt in. By doing data minimization, you can also limit the risk of data breaches and unauthorized access to data, because you've done that due diligence. Here, in a way, it's kind of reversed because now you're going to be giving people all this information that you know they have access to, but now you're putting that information into repositories that might be vulnerable and creating more copies of data than you need. That process alone is like copying a bunch of people on an email and not having good hygiene around how those emails ought to be handled. That's one issue that I see with that application.

Silver

Absolutely, and just to add a couple more. One is that we've spoken with some clients who didn't realize that the transcription function uses voice or facial recognition to identify the various speakers. There is a collection of an unintended collection of biometric information that the organization probably, if they knew about it, would not have wanted, or at the very least, if they were okay with it, they would need to take a look at whether they've provided proper notice and collected consent and all of those considerations.

Then another issue that comes up quite a bit is just an extraneous collection of information in the transcripts, because people sometimes, before the meeting or in a break in the meeting, or at the end of the meeting, will discuss things that have nothing to do with the meeting and that potentially could be sensitive in nature. Someone could talk about their upcoming medical procedure or something related to their religious or political beliefs. Certainly, that was not the subject of the meeting. No one would have taken that down in their manually taken notes, but all of that is going to be in the transcript.

To your point earlier, Joe, about redaction as a potential data minimization tool, certainly, if you are going to allow AI meeting assistance at scale, you're going to want to think about whether you need to have some process in place to have prompt reviews of those transcripts and redaction of stuff that is irrelevant to the business meeting and that may be sensitive in one way or another, whether it's a PII perspective or a company confidential information perspective and making sure that you're not creating all these transcripts that one could be breached and two could be discoverable in litigation. You can think of the impact on ESI costs. There does need to be a framework in place to ensure you are not creating huge reams of additional information that you then need to manage.

Lazzarotti

Just one more example to talk about before we close it out, Damon: I know we both talked about these performance management platforms that analyze employee activity, maybe even key-logging records and screenshots of what the employee is seeing on their screen. These are all designed for several reasons. One is to just get a sense, particularly for remote workers, of how active and productive they are, based on criteria we've established. In thinking about that, there's a lot more data we really never were able to collect before these platforms were available, and trying to keep records of that. The system will just create, collect, and retain as much of that information as you want.

One of the questions that comes up a lot is who's viewing that data, right? Just stepping aside from data minimization for a moment. How do you deal with the people who are actually in charge of the collection, what to collect, what to retain, what to disclose, and the monitoring of the monitors, if you will? In that process, Damon, can you talk a little bit more about these performance management platforms and how, because AI is built into them, they help assess and rank productivity levels when deployed?

Silver

This is yet another great example of the general problem: organizations want data because they see its value, but they don't always know exactly how it will be useful or what data they need at the front end. And so there is a tendency to just collect anything that might potentially be useful. Maybe it will end up being helpful to us to log every keystroke employees make, or maybe it'll be useful to know every website they're looking at. Some AI tools “need all that data” to create rankings and scores. That seems fine in principle because, as an employer, you usually have pretty broad rights to monitor activity on your own systems. If you have good language in your acceptable use policy or elsewhere that says we might monitor these systems, you should have no expectation of privacy in anything you do.

From a data minimization perspective, even if you had permission to collect the data, it doesn't mean that you're relieved of your obligation to consider data minimization, nor are you relieved of your obligation to secure that information on the back end. Looking again at the data breach footprint, if you're collecting all this data, you're not even going to know how many times you're collecting it, because a lot of it depends on what your employees are doing. You have to figure out a way to put safeguards around the data you collected, or you're creating a pretty significant risk. At some point, that risk is going to outweigh the benefits you're getting from having more insight into what your employees are up to.

Lazzarotti

We covered a bunch, and I know there's a lot more we don't necessarily have time for today. Damon, thanks so much for joining me today. It's a good discussion for our listeners. If you have any questions or suggestions for a future program, please reach out to us at Privacy@JacksonLewis.com. Thanks again, Damon.

© Jackson Lewis P.C. This material is provided for informational purposes only. It is not intended to constitute legal advice nor does it create a client-lawyer relationship between Jackson Lewis and any recipient. Recipients should consult with counsel before taking any actions based on the information contained within this material. This material may be considered attorney advertising in some jurisdictions. Prior results do not guarantee a similar outcome. 

Focused on employment and labor law since 1958, Jackson Lewis P.C.’s 1,100+ attorneys located in major cities nationwide consistently identify and respond to new ways workplace law intersects business. We help employers develop proactive strategies, strong policies and business-oriented solutions to cultivate high-functioning workforces that are engaged and stable, and share our clients’ goals to emphasize belonging and respect for the contributions of every employee. For more information, visit https://www.jacksonlewis.com.