Disclosed is a data-masking algorithm for arbitrary (i.e. not well-known) alpha-numeric formats that preserves both the format and the uniqueness of the input with repeatable and identical output regardless of the data encoding scheme (i.e. ASCII, EBCDIC or Unicode).
A Method of Protecting Data Privacy for Dynamically -Formatted Data
There exists a requirement to obscure data for privacy purposes without obscuring its format and uniqueness. For example, AbC-123 must produce something like QsE-795. Note that the output exactly matches the input in form (character type (letter or digit), case and the presence of the dash character). Additional requirements are repeatability (the same input, regardless of encoding or context, must produce the same output) and that the output must be as unique as the input (e.g. used as a key). These severe output constraints are not accommodated by the encryption techniques in the literature. Although this algorithm is not presented as an encryption technique, it would require the examination of a significant number of outputs where the corresponding inputs were known to discover the transformation process and thus is adequate for its intended use for data privacy.
This is a deterministic process that makes a single pass through the input value. The output format is derived from the input format using the input characters as a template
(i.e. letter, case, digit and special character relationships are maintained). The data transformation is a continually-varying combination of substitution cipher and transposition that produces an output value derived from and unique to the supplied input value. The user can affect the output value by providing an arbitrary hash seed, thereby customizing the results (i.e. using identical input, a user can be assured of a uniquely-different result from any other user).
Initialization: Prior to processing the input -
1 - Construct shuffled arrays of the 10 digits and the 26 letters of the English alphabet
2 - Optionally, shuffle the arrays further by hashing the user-provided seed value
3 - Establish a starting offset and increment for each array (optionally affected by the seed value hash)
4 - Construct a shuffled array of all 24 possible four-character permutations, arranged so the first six are also the possible three-character permutations (note that one of the permutations is the original order. The periodic omission of transposition is intended to inject some uncertainty into a reverse-engineering attempt)
5 - Establish separate starting offsets and increments for both the three-character and four-character permutation arrays (note that the possible increment values for the 3! array are 1 and 5; the possible increment values for the 4! array are 1, 11, 13 and 23. Each of these values has the property of addressing each element of the corresponding array before repeating)
Loop: Perform the following steps in a single pass through the input va...