Huge news to share! @TechCrunch named us one of the top privacy and security #StartupBattlefield companies 🙌. TechCrunchDisrupt2024! 👉 https://bit.ly/4g43wVk
AI Data Privacy: Classify and Encrypt Data using CIPH3R FPE before Integrating with Gen AI

AI Data Privacy: Classify and Encrypt Data using CIPH3R FPE before Integrating with Gen AI

Table of Contents

Prior to embarking on the integration and utilization of Generative AI within your organizational framework, it is imperative to establish and implement an AI Use Policy. This policy serves to delineate the permissible access to internal data by AI models and provides guidance on the integration process, particularly in instances involving Personally Identifiable Information (PII) data.

One approach entails establishing a sandbox environment wherein data is segregated, serving as a gateway for the utilization of Large Language Models (LLM) services. Moreover, tailored data requisites for individual use cases may necessitate the retention of sensitive data within the direct purview of the company, housed within a trusted environment. This framework exemplifies the application of a Generative AI technique known as Retrieval Augmented Generation (RAG), facilitating the integration of external knowledge from databases to enhance the precision, domain specificity, and currency of outcomes.

When considering the utilization of company data, careful steps must be taken prior to furnishing data to chatbots or employing it for the training of generative AI models. It is imperative to classify Personally Identifiable Information (PII) and select suitable data encryption or masking methodologies. Developers must exercise caution to abstain from supplying AI algorithms with PII, Highly Sensitive Personally Identifiable Information (HSPII), or copyrighted data/intellectual property. In certain use cases, the application of masking techniques may not be feasible, as it may compromise the contextual integrity of the data. In such instances, encryption utilizing Format Preserving Encryption (FPE) can be opted for. The adoption of FPE offers several advantages, which are elaborated upon herein.

Allow me to guide you through the technical steps involved in integrating your data with the CIPH3R solution, ensuring the safeguarding of your organization’s sensitive data through anonymization via encryption. Data ingestion facilitated by CIPH3R can seamlessly contribute to training datasets, bolstering the efficacy of your data-driven initiatives.

LangChain integrations types

  • Document Loader

  • Vector stores

  • Chat Messages Memory

In this blog, we will be elaborating Document Loader, the rest two topics will be covered in later.

Technology Stack

Note: CIPH3R supports various types of integration like databases and other hypervisors

LangChain Architecture

(Image Credits: https://python.langchain.com)

Langchain Architecture

The scope of this blog is not to explain and implement the key features of LangChain, but to outline how CIPH3R product can help integrate with AI Integration framework LangChain.

List of document loaders are documented here

As mentioned, let us use AWS S3 file integration technique for document loader.

Install boto3 python package:


pip install --upgrade --quiet boto3

Connect CIPH3R mInjestor to read and process sensitive data using CIPH3R data conversion schema. Using automated pipeline or manual trigger through CIPH3R portal, CIPH3R encrypts the input data to FPE format defined in schema. After processing, CIPH3R mInjestor outputs the data to AWS S3 bucket.

Connect to CIPH3R to know how to use CIPH3R free tier to process data.


from langchain_community.document_loaders import S3FileLoader

loader = S3FileLoader("FPE_output_s3bucket", "processed_fpe.csv")

loader.load()

The above code will ensure, only the encrypted data is loaded from S3 to chains.

In the next blogs, we will cover details around vector stores and chat Messages Memory.

Happy learning!

Related Posts

Gen AI Privacy: Storing PII efficiently in Vector DB using FPE

Gen AI Privacy: Storing PII efficiently in Vector DB using FPE

In our previous blog Gen AI Data Privacy, we demonstrated the practical applications of Langchain Document Loader.

Read More
OFSI B-13 Compliance through Format-Preserving Encryption (FPE)

OFSI B-13 Compliance through Format-Preserving Encryption (FPE)

Title: Aligning with OSFI B-13 Compliance through Format-Preserving Encryption (FPE)

Read More
Achieving PCI-DSS Compliance with CIPH3R’s FPE

Achieving PCI-DSS Compliance with CIPH3R’s FPE

Format-preserving encryption (FPE) can be used to help organizations accomplish certain requirements of the Payment Card Industry Data Security Standard (PCI DSS).

Read More