anonym’s documentation!

|Python |Python Version Gitlab Repo Downloads per month Downloads in total License Open Issues Project Status

The anonym library is designed to anonymize sensitive data in Python, allowing users to work with, share, or publish their data without compromising privacy or violating data protection regulations. It uses Named Entity Recognition (NER) from spacy to identify sensitive information in the data. Once identified, the library leverages the faker library to generate fake but realistic replacements. Depending on the type of sensitive information (like names, addresses, dates), corresponding faker methods are used, ensuring the anonymized data maintains a similar structure and format to the original, making it suitable for further data analysis or testing.


Warning

Disclaimer: While the anonym library is designed to identify and replace sensitive information, due to the stochastic nature of the Named Entity Recognition (NER) process, there is always a possibility that some names or other privacy-sensitive information may not be identified and replaced. In addition, while certain privacy-senitive information can be faked such as names, one also needs to think that a combination of features may need to be faked. Please review the anonymized data carefully before sharing or publishing.


Start
  |
  v
Initialize `anonym` class
  |
  v
Import data using `import_data` method
  |
  v
Anonymize data using `anonymize` method
  |         |
  |         v
  |     Extract entities using `extract_entities` function
  |         |
  |         v
  |     Generate fake labels using `generate_fake_labels` function
  |         |
  |         v
  |     Replace original labels with fake ones using `replace_label_with_fake` function
  v
Export anonymized data using `to_csv` method
  |
  v
End

Note

Your ❤️ is important to keep maintaining this package. Report bugs, issues and feature extensions at github page.

pip install anonym

Content

Installation

Indices and tables