Quickstart to anonym your data set

Anonymize the input dataframe.

param df:: DataFrame to be anonymized.
type df:: pd.DataFrame
param fakeit:: Dictionary of column names and their fake replacements, by default None
type fakeit:: dict, optional
param do_not_fake:: List of column names that should not be faked, by default None
type do_not_fake:: list, optional
param NER_blacklist:: List of named entity recognition labels to be ignored, by default [‘CARDINAL’, ‘GPE’, ‘PRODUCT’, ‘DATE’]
type NER_blacklist:: list, optional

Examples

>>> # Example 1
>>> filepath=r'./names_per_department.csv'
>>> filepath_fake=r'./names_per_department_fake.csv'
>>> # Load library
>>> from anonym import anonym
>>> # Initialize
>>> model = anonym(language='dutch', verbose='info')
>>> # Import csv data from file
>>> df = model.import_data(filepath, delim=';')
>>> # Anonimyze the data set
>>> df_fake = model.anonymize(df)
>>> # Write to csv
>>> model.to_csv(df_fake, filepath_fake)

Examples

>>> # Example 2
>>> # Load library
>>> from anonym import anonym
>>> # Initialize
>>> model = anonym(language='english', verbose='info')
>>> # Import example data set
>>> df = model.import_example('titanic')
>>> # Anonimyze the data set
>>> df_fake = model.anonymize(df)

returns:: Anonymized DataFrame.
rtype:: pd.DataFrame

Anonymize data set with user defined specifications

The anonym library automatically anonymize all available data in a data set. Because of the NER approach, not all columns can be correctly anonymized. We can manually control this by specifying the entity for each column. In the example below we will demonstrate how to force some of the columns into PERSON or DATE while other needs to remain untouched. Note that all columns that are not specified are automatically detected with entities and faked consistently.

# Filepath
filepath=r'C:\temp\extern_people_per_dep.csv'
filepath_fake=r'C:\temp\extern_people_per_dep_fake.csv'

# Import library
from anonym import anonym
# Initialize
model = anonym(verbose='info')
# Import data set
df = model.import_data(filepath)

# Set column names that needs to remain untoched
do_not_fake=['Functie', 'ID (functie)']

# Force the following columns to be catagorized as a specific Entity
fakeit = {'Budgethouder':'PERSON',
          'Behoeftesteller': 'PERSON',
          'Project- afdeling': 'ORG',
          'Financieringsbron': 'EVENT',
          'Naam': 'PERSON',
          'Startdatum': 'DATE',
          'Einddatum': 'DATE',
          'Mogelijke Einddatum': 'DATE',
          'Mogelijke Einddatum': 'DATE',
          'Totaal verpl.': 'MONEY',
          'Kasrealisatie': 'MONEY',
          }

# Run model
df_fake = model.anonymize(df, fakeit=fakeit, do_not_fake=do_not_fake)

# Export to csv
model.to_csv(df_fake, filepath_fake)