AI Data Privacy: Classify and Encrypt Data using CIPH3R FPE before Integrating with Gen AI

AI Data Privacy: Classify and Encrypt Data using CIPH3R FPE before Integrating with Gen AI

Table of Contents

Prior to embarking on the integration and utilization of Generative AI within your organizational framework, it is imperative to establish and implement an AI Use Policy. This policy serves to delineate the permissible access to internal data by AI models and provides guidance on the integration process, particularly in instances involving Personally Identifiable Information (PII) data.

One approach entails establishing a sandbox environment wherein data is segregated, serving as a gateway for the utilization of Large Language Models (LLM) services. Moreover, tailored data requisites for individual use cases may necessitate the retention of sensitive data within the direct purview of the company, housed within a trusted environment. This framework exemplifies the application of a Generative AI technique known as Retrieval Augmented Generation (RAG), facilitating the integration of external knowledge from databases to enhance the precision, domain specificity, and currency of outcomes.

When considering the utilization of company data, careful steps must be taken prior to furnishing data to chatbots or employing it for the training of generative AI models. It is imperative to classify Personally Identifiable Information (PII) and select suitable data encryption or masking methodologies. Developers must exercise caution to abstain from supplying AI algorithms with PII, Highly Sensitive Personally Identifiable Information (HSPII), or copyrighted data/intellectual property. In certain use cases, the application of masking techniques may not be feasible, as it may compromise the contextual integrity of the data. In such instances, encryption utilizing Format Preserving Encryption (FPE) can be opted for. The adoption of FPE offers several advantages, which are elaborated upon herein.

Allow me to guide you through the technical steps involved in integrating your data with the CIPH3R solution, ensuring the safeguarding of your organization’s sensitive data through anonymization via encryption. Data ingestion facilitated by CIPH3R can seamlessly contribute to training datasets, bolstering the efficacy of your data-driven initiatives.

LangChain integrations types

  • Document Loader

  • Vector stores

  • Chat Messages Memory

In this blog, we will be elaborating Document Loader, the rest two topics will be covered in later.

Technology Stack

Note: CIPH3R supports various types of integration like databases and other hypervisors

LangChain Architecture

(Image Credits: https://python.langchain.com)

Langchain Architecture

The scope of this blog is not to explain and implement the key features of LangChain, but to outline how CIPH3R product can help integrate with AI Integration framework LangChain.

List of document loaders are documented here

As mentioned, let us use AWS S3 file integration technique for document loader.

Install boto3 python package:


pip install --upgrade --quiet boto3

Connect CIPH3R mInjestor to read and process sensitive data using CIPH3R data conversion schema. Using automated pipeline or manual trigger through CIPH3R portal, CIPH3R encrypts the input data to FPE format defined in schema. After processing, CIPH3R mInjestor outputs the data to AWS S3 bucket.

Connect to CIPH3R to know how to use CIPH3R free tier to process data.


from langchain_community.document_loaders import S3FileLoader

loader = S3FileLoader("FPE_output_s3bucket", "processed_fpe.csv")

loader.load()

The above code will ensure, only the encrypted data is loaded from S3 to chains.

In the next blogs, we will cover details around vector stores and chat Messages Memory.

Happy learning!

Related Posts

How to use CIPH3R Playground Components to detect PII

How to use CIPH3R Playground Components to detect PII

Components

There two CIPH3R AI Playground components:-

Achieving SOC1, SOC 2, and SOC 3 Compliance with CIPH3R’s FPE

Achieving SOC1, SOC 2, and SOC 3 Compliance with CIPH3R’s FPE

In the landscape of regulatory compliance, adherence to SOC 1, SOC 2, and SOC 3 frameworks is paramount for organizations striving to uphold the highest standards of data security and integrity. Format Preserving Encryption (FPE) emerges as a pivotal technology, offering a seamless solution to achieve and maintain compliance across these stringent frameworks. Let’s delve into how FPE enables organizations to navigate the complexities of SOC 1, SOC 2, and SOC 3 compliance with precision and efficacy.

Read More
Achieving GDPR Compliance with CIPH3R’s FPE

Achieving GDPR Compliance with CIPH3R’s FPE

Format-preserving encryption (FPE) is a crucial tool for organizations striving to achieve compliance with the General Data Protection Regulation (GDPR) in the European Union (EU). GDPR sets stringent standards for the protection of personal data and imposes significant penalties for non-compliance. Here’s how FPE can facilitate GDPR compliance:

Read More