API References
- class anonym.anonym.anonym(language='dutch', verbose='info')
anonym class is used to anonymize data.
- anonymize(df, fakeit=None, do_not_fake=None, NER_blacklist=['CARDINAL', 'GPE', 'PRODUCT', 'DATE'])
Anonymize the input dataframe.
- Parameters:
df (pd.DataFrame) – DataFrame to be anonymized.
fakeit (dict, optional) – Dictionary of column names and their fake replacements, by default None
do_not_fake (list, optional) – List of column names that should not be faked, by default None
NER_blacklist (list, optional) – List of named entity recognition labels to be ignored, by default [‘CARDINAL’, ‘GPE’, ‘PRODUCT’, ‘DATE’]
Examples
>>> # Example 1 >>> filepath=r'./names_per_department.csv' >>> filepath_fake=r'./names_per_department_fake.csv' >>> # Load library >>> from anonym import anonym >>> # Initialize >>> model = anonym(language='dutch', verbose='info') >>> # Import csv data from file >>> df = model.import_data(filepath, delim=';') >>> # Anonimyze the data set >>> df_fake = model.anonymize(df) >>> # Write to csv >>> model.to_csv(df_fake, filepath_fake)
Examples
>>> # Example 2 >>> # Load library >>> from anonym import anonym >>> # Initialize >>> model = anonym(language='english', verbose='info') >>> # Import example data set >>> df = model.import_example('titanic') >>> # Anonimyze the data set >>> df_fake = model.anonymize(df)
- Returns:
Anonymized DataFrame.
- Return type:
pd.DataFrame
- import_data(filepath, delim=';')
Reads the dataset from the given filepath.
- Parameters:
filepath (str) – Path to the dataset file.
delim (str, optional) – Delimiter used in the dataset file, by default ‘;’
Examples
>>> # Example 1 >>> filepath=r'./names_per_department.csv' >>> filepath_fake=r'./names_per_department_fake.csv' >>> # Load library >>> from anonym import anonym >>> # Initialize >>> model = anonym() >>> # Import csv data from file >>> df = model.import_data(filepath, delim=';') >>> print(df)
- Returns:
Dataset read from the file.
- Return type:
pd.DataFrame
- import_example(data='titanic', url=None, sep=',')
Import example dataset from github source.
Import one of the few datasets from github source or specify your own download url link.
- Parameters:
data (str) – Name of datasets: ‘sprinkler’, ‘titanic’, ‘student’, ‘fifa’, ‘cancer’, ‘waterpump’, ‘retail’
url (str) – url link to to dataset.
- Returns:
Dataset containing mixed features.
- Return type:
pd.DataFrame()
References
- to_csv(df_fake, filepath, delim=';')
Writes the DataFrame to a CSV file.
- Parameters:
df_fake (pd.DataFrame) – DataFrame to be written to a file.
filepath (str) – Path to the file where DataFrame will be written.
delim (str, optional) – Delimiter to be used in the file, by default ‘;’
- anonym.anonym.check_spacy_model(language='dutch')
- anonym.anonym.clean_text(text)
Cleans the text by removing commas, dots, special characters, and extra spaces.
- Parameters:
text (str) – Text to be cleaned.
- Returns:
Cleaned text.
- Return type:
str
- anonym.anonym.extract_entities(df, fakeit=None, do_not_fake=None, NER_blacklist=['CARDINAL', 'GPE', 'DATE', 'PRODUCT'], language='dutch')
Extracts entities from the DataFrame.
- Parameters:
df (pd.DataFrame) – DataFrame from which entities will be extracted.
fakeit (dict, optional) – Dictionary of column names and their fake replacements, by default None
do_not_fake (list, optional) – List of column names that should not be faked, by default None
NER_blacklist (list, optional) – List of named entity recognition labels to be ignored, by default [‘CARDINAL’, ‘GPE’, ‘DATE’, ‘PRODUCT’]
- Returns:
List of extracted entities.
- Return type:
list
- anonym.anonym.extract_entities_for_string(nlp, text, NER_blacklist=None)
Extracts entities from the given text.
- Parameters:
text (str) – Text from which entities will be extracted.
NER_blacklist (list, optional) – List of named entity recognition labels to be ignored, by default None
- Returns:
List of extracted entities.
- Return type:
list
- anonym.anonym.filter_for_values(df, rem_values=['nan', 'ja', 'nee', 'Ja', 'Nee', 'nvt', 'n v t', 'niet', 'Niet'])
Filters the DataFrame for given values.
- Parameters:
df (pd.DataFrame) – DataFrame to be filtered.
rem_values (list, optional) – List of values to be removed from the DataFrame, by default [‘nan’, ‘ja’, ‘nee’, ‘Ja’, ‘Nee’, ‘nvt’, ‘n v t’, ‘niet’, ‘Niet’]
- Returns:
List of filtered values.
- Return type:
list
- anonym.anonym.generate_fake_labels(NER)
Generates fake labels for the given entities.
- Parameters:
NER (list) – List of entities for which fake labels will be generated.
- Returns:
DataFrame containing original labels and their fake replacements.
- Return type:
pd.DataFrame
- anonym.anonym.get_logger()
Gets the current logger.
- Returns:
Current logger.
- Return type:
logging.Logger
- anonym.anonym.preprocessing(df)
Preprocesses the DataFrame by cleaning all its columns.
- Parameters:
df (pd.DataFrame) – DataFrame to be preprocessed.
- Returns:
Preprocessed DataFrame.
- Return type:
pd.DataFrame
- anonym.anonym.replace_label_with_fake(df, NER)
Replaces original labels in the DataFrame with their fake replacements.
- Parameters:
df (pd.DataFrame) – DataFrame in which labels will be replaced.
NER (pd.DataFrame) – DataFrame containing original labels and their fake replacements.
- Returns:
DataFrame with replaced labels.
- Return type:
pd.DataFrame
- anonym.anonym.set_logger(verbose: [<class 'str'>, <class 'int'>] = 'info')
Sets the logger for verbosity messages.
- Parameters:
verbose ([str, int], default is 'info' or 20) – Sets the verbose messages using string or integer values. * [0, 60, None, ‘silent’, ‘off’, ‘no’]: No message. * [10, ‘debug’]: Messages from debug level and higher. * [20, ‘info’]: Messages from info level and higher. * [30, ‘warning’]: Messages from warning level and higher. * [50, ‘critical’]: Messages from critical level and higher.
- Return type:
None.
Examples
>>> # Set the logger to warning >>> set_logger(verbose='warning') >>> # Test with different messages >>> logger.debug("Hello debug") >>> logger.info("Hello info") >>> logger.warning("Hello warning") >>> logger.critical("Hello critical")