anonym’s documentation!
The anonym
library is designed to anonymize sensitive data in Python, allowing users to work with, share, or publish their data without compromising privacy or violating data protection regulations. It uses Named Entity Recognition (NER) from spacy
to identify sensitive information in the data. Once identified, the library leverages the faker
library to generate fake but realistic replacements. Depending on the type of sensitive information (like names, addresses, dates), corresponding faker methods are used, ensuring the anonymized data maintains a similar structure and format to the original, making it suitable for further data analysis or testing.
Warning
Disclaimer: While the anonym
library is designed to identify and replace sensitive information, due to the stochastic nature of the Named Entity Recognition (NER) process, there is always a possibility that some names or other privacy-sensitive information may not be identified and replaced. In addition, while certain privacy-senitive information can be faked such as names, one also needs to think that a combination of features may need to be faked. Please review the anonymized data carefully before sharing or publishing.
Start
|
v
Initialize `anonym` class
|
v
Import data using `import_data` method
|
v
Anonymize data using `anonymize` method
| |
| v
| Extract entities using `extract_entities` function
| |
| v
| Generate fake labels using `generate_fake_labels` function
| |
| v
| Replace original labels with fake ones using `replace_label_with_fake` function
v
Export anonymized data using `to_csv` method
|
v
End
Note
Your ❤️ is important to keep maintaining this package. Report bugs, issues and feature extensions at github page.
pip install anonym