Kamlesh Kantilal (kamlesh@cs.arizona.edu)
This algorithm detects boolean variables and arrays and modifies all uses and definitions of these variables. Every boolean variable or array element is split into 2 variables or array elements, and the state of the original variable is reflected in the combined state of these 2 variables or array elements. This algorithm consists of two components. One is boolean variable identification. The other is splitting.
An assignment to a boolean variable in java source code generates the following java bytecode sequence:
ifeq l1 iconst 0 goto l2 l1: iconst 1 l2: istore boolean var
An boolean operation is implemented similarly. For example, b=a&&c generates the following code sequence:
iload a ifeq l1 iload c ifeq l1 iconst 1 goto l2 l1: iconst 0 l2: istore 1
We do data flow analysis to identify the possbile value range of any
variable, if it is [0,1], we consider it as boolean variable. This
information is partly obtained using the stack simulator. But it does
not work for the following cases:
b=a, whose code sequence is:
iload a istore b
Hence, we have to do something else. An iterative algorithm is used wherein on every iteration we identify non boolean variables and eliminate them. This iteration stops when no elimination were made in the last iteration as we have identified all the non boolean variables. The remaining variables are considered as boolean.
The basic idea of the XOR splitting technique is as follows:
1 boolean a = true; 2 boolean b = false;
is converted to:
1 boolean a1 = true,a2 = false; 2 boolean b1 = false,b2 = false;
or
1 boolean a1 = false,a2 = true; 2 boolean b1 = true,b2 = true;
In general, a single boolean variable will be replaced with a pair of boolean variables such that the state of the original variable at any program point is equal to the XOR of the 2 new variables. Likewise, a boolean array 'a' will be replaced by an array 'b' with twice as many elements, in which the state of array element a[i] in the original array at any program point is equal to the XOR of b[2*i] and b[2*i + 1].
Our implementation only obfuscates boolean variables where all assignments to the variables are in one of the following forms:
1 iconst_0 2 istore_1 3 45 6 iconst_1 7 istore_1
or
iload_1 istore_2
or
12 ifeq L1 3 iconst_1 4 goto L2 5 L1: 6 iconst_0 7 L2: 8 istore_1
These patterns are easily modi ed to work as split variables:
1 ifeq L1 2 iconst_1 3 iconst_0 4 goto L2 5 L1: 6 iconst_0 7 iconst_0 8 L2: 9 istore_1 10 istore_2
Our implementation only obfuscates arrays that are allocated in a function and used in speci c ways exclusively in that function. Allocation must look like this (all examples use the XOR obfuscation):
1 newarray boolean 2 astore_1
This pattern is converted to this pattern in the obfuscated code, which produces an array that is twice as long:
1 iconst_1 2 ishl 3 newarray boolean 4 astore_1
All stores must look approximately like this:
1 aload_1 2 iload_2 3 iload_2 4 ifeq L1 5 iconst_1 6 goto L2 7 L1: 8 iconst_0 9 L2: 10 bastore
This pattern is obfuscated as follows:
ALOAD_1 ILOAD_2 DUP2_X1 POP DUP_X1 POP DUP DUP2_X1; POP POP ICONST_1 ISHL DUP DUP_X2 POP ICONST_1 IADD ICONST_0 BASTORE DUP_X2 POP DUP_X2 POP BASTORE POP
All uses must look approximately like this:
1 aload_1 2 iload_2 3 baload
This pattern is obfuscated as follows:
1 aload_1 2 iload_2 3 iconst_1 4 ishl 5 dup2 6 iconst_1 7 iadd 8 baload 9 dup_x2 10 pop 11 baload 12 ixor
There are no extra configuration parameters necessary to run this obfuscator.