Quickstart to anonym your data set
Anonymize the input dataframe.
- param df:
DataFrame to be anonymized.
- type df:
pd.DataFrame
- param fakeit:
Dictionary of column names and their fake replacements, by default None
- type fakeit:
dict, optional
- param do_not_fake:
List of column names that should not be faked, by default None
- type do_not_fake:
list, optional
- param NER_blacklist:
List of named entity recognition labels to be ignored, by default [‘CARDINAL’, ‘GPE’, ‘PRODUCT’, ‘DATE’]
- type NER_blacklist:
list, optional
Examples
>>> # Example 1
>>> filepath=r'./names_per_department.csv'
>>> filepath_fake=r'./names_per_department_fake.csv'
>>> # Load library
>>> from anonym import anonym
>>> # Initialize
>>> model = anonym(language='dutch', verbose='info')
>>> # Import csv data from file
>>> df = model.import_data(filepath, delim=';')
>>> # Anonimyze the data set
>>> df_fake = model.anonymize(df)
>>> # Write to csv
>>> model.to_csv(df_fake, filepath_fake)
Examples
>>> # Example 2
>>> # Load library
>>> from anonym import anonym
>>> # Initialize
>>> model = anonym(language='english', verbose='info')
>>> # Import example data set
>>> df = model.import_example('titanic')
>>> # Anonimyze the data set
>>> df_fake = model.anonymize(df)
- returns:
Anonymized DataFrame.
- rtype:
pd.DataFrame
Anonymize data set with user defined specifications
The anonym
library automatically anonymize all available data in a data set.
Because of the NER approach, not all columns can be correctly anonymized. We can manually
control this by specifying the entity for each column. In the example below we will demonstrate how
to force some of the columns into PERSON or DATE while other needs to remain untouched.
Note that all columns that are not specified are automatically detected with entities and faked consistently.
# Filepath
filepath=r'C:\temp\extern_people_per_dep.csv'
filepath_fake=r'C:\temp\extern_people_per_dep_fake.csv'
# Import library
from anonym import anonym
# Initialize
model = anonym(verbose='info')
# Import data set
df = model.import_data(filepath)
# Set column names that needs to remain untoched
do_not_fake=['Functie', 'ID (functie)']
# Force the following columns to be catagorized as a specific Entity
fakeit = {'Budgethouder':'PERSON',
'Behoeftesteller': 'PERSON',
'Project- afdeling': 'ORG',
'Financieringsbron': 'EVENT',
'Naam': 'PERSON',
'Startdatum': 'DATE',
'Einddatum': 'DATE',
'Mogelijke Einddatum': 'DATE',
'Mogelijke Einddatum': 'DATE',
'Totaal verpl.': 'MONEY',
'Kasrealisatie': 'MONEY',
}
# Run model
df_fake = model.anonymize(df, fakeit=fakeit, do_not_fake=do_not_fake)
# Export to csv
model.to_csv(df_fake, filepath_fake)