Human genetic data is sensitive
VCFanonymizer anonymizes VCF by shuffling genotypes between samples.
Provided enough samples are in the VCF, the anonymized file will prevent recreating genome-wide
genotype of a participating individual.
However, anyone having access to genotypes of an individual in a VCF (or individual's close
relatives) can confirm hers/his presence in the VCF, and thereby potentially leak sensitive
Best strategy to avoid leaking sensitive information is to use VCFs without links to a particular
phenotype (e.g. control datasets with healthy individuals have no sensitive phenotype information).
Phenotype can be linked to a VCF file via for instance information in the VCF's metadata,
information about the study that produced the VCF,
by linking your public profile (e-mail, GitHub, Biostars) to your or your group's research
by linking genomes across public datasets, where one has phenotype information (Gymrek, 2013).
Human genetic data is sensitive. Consider if the VCF after anonymization can be shared without
violating participants' consent.