Open
Description
Is your feature request related to a problem? Please describe.
I cannot figure out how to decrypt in batch.
Describe the solution you'd like
I would like to be able to decrypt in batch.
Describe alternatives you've considered
I initially was going row by row, too time consuming.
Additional context
What I have so far:
import pandas as pd
from presidio_analyzer import AnalyzerEngine, BatchAnalyzerEngine
from presidio_anonymizer import BatchAnonymizerEngine, DeanonymizeEngine
from presidio_anonymizer.entities import OperatorConfig
import pyarrow.dataset as ds
import pandas as pd
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()
# Open the Parquet dataset from S3
dataset = ds.dataset("s3://my-large-dataset.parquet", format="parquet")
# Take a subset of the dataset
scanner = dataset.scanner(columns=["free_text"])
table = scanner.head(100000)
df = table.to_pandas().reset_index(drop=True)
df = df["free_text"].drop_duplicates().reset_index(drop=True)
# DataFrame to dict
df_dict = {"free_text": df.tolist()}
# Analyze
analyzer = AnalyzerEngine()
batch_analyzer = BatchAnalyzerEngine(analyzer_engine=analyzer)
analyzer_results = batch_analyzer.analyze_dict(df_dict, language="en")
analyzer_results = list(analyzer_results)
# Encrypt
anonymizer_config = {"DEFAULT": OperatorConfig("encrypt", {"key": "SOME KEY"})}
batch_anonymizer = BatchAnonymizerEngine()
anonymizer_results = batch_anonymizer.anonymize_dict(analyzer_results, operators=anonymizer_config)
scrubbed_df = pd.DataFrame(anonymizer_results)
scrubbed_df