Data Collection and Privacy for SaaS APIs

The data collected by SaaS APIs under the hood directly affects the privacy and terms of service (ToS) of consuming applications. In this post, I'll cover the basics of data collection hygiene for SaaS APIs, which is essential to enable a transparent integration and operational experience for customers.

Transparency

Let's get the obvious stuff out of the way. Transparency is of utmost importance for SaaS APIs that collect data.

  • Document a clear, detailed list of data collected by APIs (or by the wrapper SDKs).

  • More importantly, document why the data is collected and why the data is necessary, and how it surfaces in the product offering.

  • Document who can see the data (data sharing).

  • Open-sourcing SDK APIs can go a long way in obviating and evidencing policies.

  • A variety of services specialize in software supply chain security (S3C). Certifying your privacy and data collection posture through one of these services will make your offering more attractive to larger, more risk-averse customer segments.

Compliance

Of paramount importance while collecting data as a SaaS API is the legal posture you assume. Data collected by APIs and how it gets used can have varying implications in different geographies. Here're some examples.

  • Payment Card Infra (PCI): While not the law (I'm not a lawyer), collecting credit card (more generally, payment instrument) information in your application or API requires that you follow specific standards. The Play and App store terms of service codify and enforce this policy on apps.

  • General Data Protection Regulation (GDPR): Understanding GDPR in depth requires a legal expert (I'm not that person). Grossly, it regulates the collection, storage, and sharing of data. The law covers a level of detail that can often be hard to understand for consumers of your API. In such cases, APIs should go above and beyond to provide data collection controls that are both compliant and usable.

  • Data Localization and governance laws: Several governments have forced data to be located within national borders. While the principles and value offered by these laws are beyond the scope of this post, APIs in violation may be putting themselves, their customers, and their customer's customers at risk.

Play store and app store policies

A reflection of the laws and awareness around privacy in recent times is the focus on data collection in the App and Play stores. Apple introduced Nutrition Labels, and Google introduced the Play privacy labels that allow consumers of an application to understand what data is collected by an app and how it's used. As the developer of a SaaS API, documenting your data collection as outlined in the sections above will help your customer developers to check the right set of privacy labels for their mobile apps to be safely distributed via these stores. As a reference, Firebase does an excellent job of documenting data collected on different mobile application platforms as it relates to the labels defined by Apple and Google.

Control plane

SaaS APIs should broadly provide two mechanisms for customers to configure data collection.

  1. Static/Build time configuration: Customers should be able to set a default on/off for all types of data that the API collects before the application binary is built and distributed. This default behavior directly affects the Terms of Service of the app.

  2. Dynamic/runtime configuration: Consumers of applications should be able to opt into/out of specific data collected about them. APIs that allow runtime configuration of these policies will enable apps to stay compliant.

Tips to expressing optionality

While this is an implementation detail, the optionality of data can quickly lead to a messy implementation. I hope to share a couple of fundamental mechanisms that have allowed Firebase to manage this effectively. Big shout out to my colleagues David Poll, Shyam Jayaraman, Vladimir Kryachko, Ryan Wilson, and Paul Beusterien for pioneering these concepts. They're incredible at their jobs.

  1. Dependency injection of optional data : The simplest way to handle the optionality of data at app start is by dependency injecting it into the APIs or Wrapper SDKs at app startup. Modeling the data as an optional type from the time it's injected is the cleanest way to guarantee graceful degradation when it is absent. This mechanism is only effective in cases where when the data does not become available later in the application lifecycle, breaking immutability.

  2. Optional dependencies and composable SDKs : In some cases where local administration deems collecting certain types of data illegal, apps may get flagged for even having code that could collect that data. In these cases, apps distribute different binaries/app bundles to customers in different geographic locations. Isolating the collection of optional data into separate composable SDKs will allow customers to include specific parts of your API into specific application binaries. The code snippet below is not easily understandable without knowing the semantics of the components framework in Firebase. I’ll write about that in a separate post. But it should provide a sense of how SDKs can be decomposed and recomposed.

Previous
Previous

Mobile developer personas

Next
Next

Authentication for API operators