多表代换Virginia加密算法及秘钥破解算法的实现 方贤进 http://star.aust.edu.cn/~xjfang Email: xjfang@aliyun.com
编程任务要求 编程语言为C语言或其它语言,要求提交加密、解密、破解源代码文件。 实现对任意有意义的英文文本文件(*.txt)的Virginia加密、解密算法,其中秘钥是任意输入的一个字符串。要求提供明文文本文件、密文文本文件。 在不知道秘钥的情况下,对一个用Virginia加密算法生成的密文文本文件进行破解,包括破解秘钥、生成对应的明文。要求提供程序测试说明文档。
Virginia加密算法、解密算法
Virginia加密算法 假设语言的字符集为 Charset[26]={‘a’, ’b’, …, ’z’} 字符集大小=26 对应的字符编码为 Coding[26]={0, 1, …, 25}
Virginia加密算法 Virginia加密算法是对明文进行加密的过程中依照密钥的指示轮流使用多个单表代替密码。 设明文串为: M=m1m2…mn,mi∈charset, n是明文长度 秘钥为: K=k1k2…kd,ki∈charset, d是秘钥长度 密文为: C=c1c2…cn,ci∈charset, n是密文长度
Virginia加密算法 cj+td=(mj+td+kj ) mod 26 mj+td=(cj+td -kj ) mod 26 加密算法: j=1…d, t=0…ceiling(n/d)-1 其中ceiling(x)函数表示不小于x最小整数 解密算法: mj+td=(cj+td -kj ) mod 26 j=1…d, t=0…ceiling(n/d)-1 其中ceiling(x)函数表示不小于x最小整数
Virginia加密算法举例 明文长度n=11,秘钥长度d=3, t=ceiling(11/3)-1=3 m1 m2 m3 m4 m5 m6 (编码) n (13) o (14) t (19) h (7) i (8) g (6) s (18) 秘钥K j (9) y (24) 密文C w (22) c (2) r (17) q (16) l (11) p (15) j=1 t=0 j=2 j=3 t=1 t=2 t=3 明文长度n=11,秘钥长度d=3, t=ceiling(11/3)-1=3
一个原始的明文文本 Differential Privacy is the state-of-the-art goal for the problem of privacy-preserving data release and privacy-preserving data mining. Existing techniques using differential privacy, however, cannot effectively handle the publication of high-dimensional data. In particular, when the input dataset contains a large number of attributes, existing methods incur higher computing complexity and lower information to noise ratio, which renders the published data next to useless. This proposal aims to reduce computing complexity and signal to noise ratio. The starting point is to approximate the full distribution of high-dimensional dataset with a set of low-dimensional marginal distributions via optimizing score function and reducing sensitivity, in which generation of noisy conditional distributions with differential privacy is computed in a set of low-dimensional subspaces, and then, the sample tuples from the noisy approximation distribution are used to generate and release the synthetic dataset. Some crucial science problems would be investigated below: (i) constructing a low k-degree Bayesian network over the high-dimensional dataset via exponential mechanism in differential privacy, where the score function is optimized to reduce the sensitivity using mutual information, equivalence classes in maximum joint distribution and dynamic programming; (ii)studying the algorithm to compute a set of noisy conditional distributions from joint distributions in the subspace of Bayesian network, via the Laplace mechanism of differential privacy. (iii)exploring how to generate synthetic data from the differentially private Bayesian network and conditional distributions, without explicitly materializing the noisy global distribution. The proposed solution may have theoretical and technical significance for synthetic data generation with differential privacy on business prospects.
经过预处理之后的明文文本 (只保留字符集中的字符) differentialprivacyisthestateoftheartgoalfortheproblemofprivacypreservingdatareleaseandprivacypreservingdataminingexistingtechniquesusingdifferentialprivacyhowevercannoteffectivelyhandlethepublicationofhighdimensionaldatainparticularwhentheinputdatasetcontainsalargenumberofattributesexistingmethodsincurhighercomputingcomplexityandlowerinformationtonoiseratiowhichrendersthepublisheddatanexttouselessthisproposalaimstoreducecomputingcomplexityandsignaltonoiseratiothestartingpointistoapproximatethefulldistributionofhighdimensionaldatasetwithasetoflowdimensionalmarginaldistributionsviaoptimizingscorefunctionandreducingsensitivityinwhichgenerationofnoisyconditionaldistributionswithdifferentialprivacyiscomputedinasetoflowdimensionalsubspacesandthenthesampletuplesfromthenoisyapproximationdistributionareusedtogenerateandreleasethesyntheticdatasetsomecrucialscienceproblemswouldbeinvestigatedbelowiconstructingalowkdegreebayesiannetworkoverthehighdimensionaldatasetviaexponentialmechanismindifferentialprivacywherethescorefunctionisoptimizedtoreducethesensitivityusingmutualinformationequivalenceclassesinmaximumjointdistributionanddynamicprogrammingiistudyingthealgorithmtocomputeasetofnoisyconditionaldistributionsfromjointdistributionsinthesubspaceofbayesiannetworkviathelaplacemechanismofdifferentialprivacyiiiexploringhowtogeneratesyntheticdatafromthedifferentiallyprivatebayesiannetworkandconditionaldistributionswithoutexplicitlymaterializingthenoisyglobaldistributiontheproposedsolutionmayhavetheoreticalandtechnicalsignificanceforsyntheticdatagenerationwithdifferentialprivacyonbusinessprospects
经过virginia加密后的密文 加密秘钥key=infosec lvktwvgvgnodttqifqqmubujglevmbkhziczglcsphweyvwttwoqseshxenjsgaxejgwvxqalrsxczrqsswgiaidjmxipddjiumeawfkfigfaarkvtjlawvqalhwgjvvviwwwavsuvmhnrwsfxkiyufazcklmcoixmehofrqbrktwgvqijzqlcvqqsllgxhgzagcbvtbgjjqtmraqgvfncfenlnyoarrieywuyniebvwrvprnbhyvlnyokivkbshsmpanqojkgvhrpwvqnnyhjmdcgjgwbkagnbyqgbutrkmpkhwvakjmehcetwbvsuusoxyjlaxaiaizgagzvstgvoigncfxqvbngwvcbvtkzmepejbvitagmshydtvxvwhfigfbwbvbbzgwpgafyvawrzbuckenivrglstmqzqwgquczharikbrddizqgdofhuqtsodxqvbngwvcbvthziubnwharixbnblmubbfdhvqfvrolivprkidpfqfyfafwbvtbgjjqtmraqgvfncfenlnyokivevyvswgbbkzgafqzjbkmqvnqasviqafzvmubenpmxkwaxjaeqxgnaadkvtxqgvgnhsqlmqvnsrjifcpnbywgvfnhazkblnbolkkulsfitigncfshvbngqgqvqnhaspiyiwkxtqozhaspajnhzhknsjfwrvqnqdjmxipdwkgquczhwhkvnxslshtbbraqgvfncfenahggheemffbvxjmayvwwcucqslyrtrxtjsobujbgmugnudjszqzfhasplvxhjmdcgncfetmhxsvxqorssjevmnsrjinmnxsllgalshzivqpioleumgxceiezhhwspukvjbuirzbgzwquebzzvfgqaaskxkonysvfgtbbwuspagwiuxkvtfzgamlrlfwidiljgaepvrykgvmwijfllgpvlvvmomaxwgrctqfhswgbinowbrwajblmctzjqzepqfrwfhknsjfwrvqnqdjmxipdkzitmgmskgqzrkifgvqbswksrbvrwrifbbwsvyemgmskipavywnmvghxwfkocgzodmpnbwasxkwajemmxiyjbuietnxgwwkvzflaqwuwtwfxfqfyfafwbvtbsrfllsoemexetujeouvsuamubhimaribujodkqzvyvexqkbrdmxgifjhgjpwvxmusplvywgrctqnglvkjhywgrunetabskvgiwkxtqozhaspavshziucoxdsggwsgoqiuqnsbwxywepjaevprqohpckrrsulcvvxagjfqsksjipbvfzhvkdnhmamkmkuzgvkvtmcoxqorssjevmfdbllgbvhrsxcnetallglvktwvgvgnodpaxenjsxgjndskmcvajhostsnsrusplvywgrctqnglvkjhywgruevyvgyvmkuzagkbydasxgzvfzadkvtyvwrqqfdudsdiyiwkxtqozhaspbujdjsrwfjrksncgncfqcgufjwxjmbwslmeiyfbvxgkuswuenavlbajkknsqwjqzfdbllgbvhrsxcorssjevqbskaxjlvktwvgvgnodttqifqqspjhxwfiuacwcktgkgxvzlj
Virginia加密秘钥的破解 ——唯密文攻击
概念:重合指数及其无偏估计值 重合指数:设某种语言由n个字母组成,每个字母i发生的概率为pi(1≤i≤n),则重合指数就是指两个随机字母相同的概率,记为IC 一般用IC的无偏估计值IC’来近似计算IC. 其中的xi表示字母i出现的频次,L表示文本长度,n表示某种语言中包含的字母数。
IC’值的三大特点 这是3个非常重要的结论! 可通过下面的实验加以验证。 随机英文文本的IC’总是大约为0.038.
Example 1: 一个随机英文文本明文及其IC’
对以上的随机英文文本明文采用移位加密(key=17)后的密文及其IC’
Example 2: 一个有意义的英文text Differential Privacy is the state-of-the-art goal for the problem of privacy-preserving data release and privacy-preserving data mining. Existing techniques using differential privacy, however, cannot effectively handle the publication of high-dimensional data. In particular, when the input dataset contains a large number of attributes, existing methods incur higher computing complexity and lower information to noise ratio, which renders the published data next to useless. This proposal aims to reduce computing complexity and signal to noise ratio. The starting point is to approximate the full distribution of high-dimensional dataset with a set of low-dimensional marginal distributions via optimizing score function and reducing sensitivity, in which generation of noisy conditional distributions with differential privacy is computed in a set of low-dimensional subspaces, and then, the sample tuples from the noisy approximation distribution are used to generate and release the synthetic dataset. Some crucial science problems would be investigated below: (i) constructing a low k-degree Bayesian network over the high-dimensional dataset via exponential mechanism in differential privacy, where the score function is optimized to reduce the sensitivity using mutual information, equivalence classes in maximum joint distribution and dynamic programming; (ii)studying the algorithm to compute a set of noisy conditional distributions from joint distributions in the subspace of Bayesian network, via the Laplace mechanism of differential privacy. (iii)exploring how to generate synthetic data from the differentially private Bayesian network and conditional distributions, without explicitly materializing the noisy global distribution. The proposed solution may have theoretical and technical significance for synthetic data generation with differential privacy on business prospects. 其重合指数的无偏估计值IC’为:0.0659
假设Virginia加密是针对有意义的英文文本加密,那么如何对用Virginia多表代换加密之后的密文进行破解呢? (唯密文攻击) step1:估算Virginia多表代换加密的秘钥长度 step2:再计算秘钥中的每个字符
经过预处理之后的明文文本 (只保留字符集中的字符) differentialprivacyisthestateoftheartgoalfortheproblemofprivacypreservingdatareleaseandprivacypreservingdataminingexistingtechniquesusingdifferentialprivacyhowevercannoteffectivelyhandlethepublicationofhighdimensionaldatainparticularwhentheinputdatasetcontainsalargenumberofattributesexistingmethodsincurhighercomputingcomplexityandlowerinformationtonoiseratiowhichrendersthepublisheddatanexttouselessthisproposalaimstoreducecomputingcomplexityandsignaltonoiseratiothestartingpointistoapproximatethefulldistributionofhighdimensionaldatasetwithasetoflowdimensionalmarginaldistributionsviaoptimizingscorefunctionandreducingsensitivityinwhichgenerationofnoisyconditionaldistributionswithdifferentialprivacyiscomputedinasetoflowdimensionalsubspacesandthenthesampletuplesfromthenoisyapproximationdistributionareusedtogenerateandreleasethesyntheticdatasetsomecrucialscienceproblemswouldbeinvestigatedbelowiconstructingalowkdegreebayesiannetworkoverthehighdimensionaldatasetviaexponentialmechanismindifferentialprivacywherethescorefunctionisoptimizedtoreducethesensitivityusingmutualinformationequivalenceclassesinmaximumjointdistributionanddynamicprogrammingiistudyingthealgorithmtocomputeasetofnoisyconditionaldistributionsfromjointdistributionsinthesubspaceofbayesiannetworkviathelaplacemechanismofdifferentialprivacyiiiexploringhowtogeneratesyntheticdatafromthedifferentiallyprivatebayesiannetworkandconditionaldistributionswithoutexplicitlymaterializingthenoisyglobaldistributiontheproposedsolutionmayhavetheoreticalandtechnicalsignificanceforsyntheticdatagenerationwithdifferentialprivacyonbusinessprospects
经过virginia加密后的密文 加密秘钥key=infosec lvktwvgvgnodttqifqqmubujglevmbkhziczglcsphweyvwttwoqseshxenjsgaxejgwvxqalrsxczrqsswgiaidjmxipddjiumeawfkfigfaarkvtjlawvqalhwgjvvviwwwavsuvmhnrwsfxkiyufazcklmcoixmehofrqbrktwgvqijzqlcvqqsllgxhgzagcbvtbgjjqtmraqgvfncfenlnyoarrieywuyniebvwrvprnbhyvlnyokivkbshsmpanqojkgvhrpwvqnnyhjmdcgjgwbkagnbyqgbutrkmpkhwvakjmehcetwbvsuusoxyjlaxaiaizgagzvstgvoigncfxqvbngwvcbvtkzmepejbvitagmshydtvxvwhfigfbwbvbbzgwpgafyvawrzbuckenivrglstmqzqwgquczharikbrddizqgdofhuqtsodxqvbngwvcbvthziubnwharixbnblmubbfdhvqfvrolivprkidpfqfyfafwbvtbgjjqtmraqgvfncfenlnyokivevyvswgbbkzgafqzjbkmqvnqasviqafzvmubenpmxkwaxjaeqxgnaadkvtxqgvgnhsqlmqvnsrjifcpnbywgvfnhazkblnbolkkulsfitigncfshvbngqgqvqnhaspiyiwkxtqozhaspajnhzhknsjfwrvqnqdjmxipdwkgquczhwhkvnxslshtbbraqgvfncfenahggheemffbvxjmayvwwcucqslyrtrxtjsobujbgmugnudjszqzfhasplvxhjmdcgncfetmhxsvxqorssjevmnsrjinmnxsllgalshzivqpioleumgxceiezhhwspukvjbuirzbgzwquebzzvfgqaaskxkonysvfgtbbwuspagwiuxkvtfzgamlrlfwidiljgaepvrykgvmwijfllgpvlvvmomaxwgrctqfhswgbinowbrwajblmctzjqzepqfrwfhknsjfwrvqnqdjmxipdkzitmgmskgqzrkifgvqbswksrbvrwrifbbwsvyemgmskipavywnmvghxwfkocgzodmpnbwasxkwajemmxiyjbuietnxgwwkvzflaqwuwtwfxfqfyfafwbvtbsrfllsoemexetujeouvsuamubhimaribujodkqzvyvexqkbrdmxgifjhgjpwvxmusplvywgrctqnglvkjhywgrunetabskvgiwkxtqozhaspavshziucoxdsggwsgoqiuqnsbwxywepjaevprqohpckrrsulcvvxagjfqsksjipbvfzhvkdnhmamkmkuzgvkvtmcoxqorssjevmfdbllgbvhrsxcnetallglvktwvgvgnodpaxenjsxgjndskmcvajhostsnsrusplvywgrctqnglvkjhywgruevyvgyvmkuzagkbydasxgzvfzadkvtyvwrqqfdudsdiyiwkxtqozhaspbujdjsrwfjrksncgncfqcgufjwxjmbwslmeiyfbvxgkuswuenavlbajkknsqwjqzfdbllgbvhrsxcorssjevqbskaxjlvktwvgvgnodttqifqqspjhxwfiuacwcktgkgxvzlj
step1:估算秘钥长度 (1)测试将密文分成2个子串,然后计算其IC’的平均值; (2)测试将密文分成3个子串,然后计算其IC’的平均值; …… (3)测试将密文分成n个子串,然后计算其IC’的平均值; 如果在将密文分成d个子串时, 计算其IC’的平均值近似为0.065,则Virginia加密的秘钥长度为d。
Example: 将ciphertext分成2个子串 计算2个子串的重合指数无偏估计值的平均值为IC=0.0419
Example: 将ciphertext分成3个子串 计算3个子串的重合指数无偏估计值的平均值为IC=0.0419
Example: 依此类推,将ciphertext分成7个子串 子串1:lvqbmzwwxxqziimivqvanikmbqvxbqvliiplkavncabkmbxizivbpatibazimukqqvbbxbfpqbqvlebqvqbwxvnvcvbkivviqanqiuvtvammutbgqlcmommaqmzkzeqotavlivwpmtbwtqnqimzqbbmagcnwitvuqblxubbzkiwltjnvqacwqwpkvqbdmvombnlvxjvsltjembzvqiqbwcgmikakzboqlvqjak 子串2:vgiubgeoeearapegtavvryleriqhvtfneernbnhngguhevyavgbvegvgbfbvqcbgtbvnbbvrfvtfnvbznaeagthnpflugbqyojsnpcnbfhfacrunzvghrnnlpghvbbanbgtrlrivaqiazfsnpgrbvbgvhgbaynzwfvlevhuvbfvvqhegovosnerrvsvnktrfvevgenanvqhvkyvtfyoufgubyuvnfvrbvgihcg 子串3:knfjklyqnjlqidafjlvswumhkjqgtmnyybnysqryjntwhsjisnntjmxfzyurzzrdsntwnfrkytmnyykjqfnxnxssnnnlnnniznjqdzxbngfyqxjufxnxssxsixhjgzaybwfljyjlxfnjjrjqdmksrwmyxzwjjxftytstsijyrjxynytizsxgspqrxkfhumsdhtknndjsynyyudfydizjjnfwfslsdhssknfxwx …… 子串7:gtuvchthaxcgxufkvjwhkcxqvcgcjgnrnvvvpgqdkgpjwoagoqcetdfvgrntqizuqcuiuqvfwjgnvgfqiukqkgqfgkkthqptpkvxqkhgnejcrouzpdtqvngvueurugkgpkmdpmgocgrcpkvxtqvrfepvopkxekwfwfeouiqqgppckuktpuguyvccfpkkkqvgcggagctpckuvkgkqdtprncjegnkqgcvjgtpug 计算7个子串的重合指数无偏估计值的平均值为IC=0.0657
将密文串划分成多个子串,分别求IC无偏估计值平均值 数 子串1 子串2 子串3 子串4 子串5 子串6 子串7 平 均 IC 长 1 1609 0.0419 2 805 0.0427 804 0.0411 3 537 0.0417 536 0.0424 4 403 0.0425 402 0.0.98 5 322 0.0414 0.0418 0.0413 321 0.0415 6 269 0.0402 268 0.0397 0.0441 0.0432 0.0416 7 230 0.0674 0.0677 0.0621 0.0584 0.0744 0.0666 229 0.0634 0.0657 8 … 0.0422 因为有意义的英文文本的明文IC ≈ 0.065,而移位加密不改变其IC值,所以对应的密文的IC ≈0.065。通过上表可知秘钥长度d=7.
step2:计算秘钥中的每个字符 (1) 根据Virginia加密算法可知, 每个子串中的密文字母都是对明文中的字母经过相同的移位加密得到的,即第i(i=1…d)个子串是用秘钥key中的第i个字符进行移位加密得到的!移位加密的密钥空间仅为26。因此对每个密文子串测试26次移位算法进行解密,每次测试时计算该子串的拟重合指数,拟重合指数最高的那次移位数(编码)就是该子串所对应的Virginia加密密钥中的那个字母。 (2)对步骤(1)重复d次即可得到组成密钥的所有字母。
拟重合指数 拟重合指数:设某种语言由n个字母组成,每个字母i的统计概率为pi(i=1…n),每个字母在密文子串Cj ( j=1…d)中出现的频次为fi,j ,每个密文子串Cj的长度为ni,j ,则第j个子串的拟重合指数定义为:
明文中各个字母出现的统计概率(pi)
Example: 假如对密文子串3测试26次移位算法进行解密 移位数 密文子串3经过移位 加密后的拟重合指数 1(b) 0.0387 14 0.0326 2(c) 0.0325 15 0.0348 3(d) 0.0324 16 0.0416 4(e) 0.0368 17 0.0392 5 (f) 0.0615 18 0.0405 6 0.0433 19 0.0361 7 0.0332 20 0.0461 8 0.0279 21 0.0386 9 0.0468 22 0.0356 10 0.0384 23 0.0313 11 0.0365 24 0.0364 12 25 0.0429 13 26(a) 0.0340 子串3:knfjklyqnjlqidafjlvswumhkjqgtmnyybnysqryjntwhsjisnntjmxfzyurzzrdsntwnfrkytmnyykjqfnxnxssnnnlnnniznjqdzxbngfyqxjufxnxssxsixhjgzaybwfljyjlxfnjjrjqdmksrwmyxzwjjxftytstsijyrjxynytizsxgspqrxkfhumsdhtknndjsynyyudfydizjjnfwfslsdhssknfxwx 计算密文子串3执行26次移位算法的26个拟重合指数! 所以Virginia加密密钥中的第三个字母为”f” 依此类推,可求出7个密文子串的所对应的Virginia加密的密钥为”infosec”
The End Thank you!