Algorithm and Data Structure – Counting Anagrams

Recently I played an algorithm and data structure question which is called Counting Anagrams, its requirement is like:

Input

2

abdcghbaabcdij     bcda

bbbababaaabbbb     ab

Output

2 0 8

6 2 3 4 5 6 9

As first line of output, because there are two anagrams of “bcda” in “abdcghbaabcdij” in the first case, the first number is number of anagrams and then index of anagrams, similar case as second line.

The most straightforward  way to solve this problem is to search certain length (same length as second parameter in each line) substring from each applicable index in the first parameter and then sort each other to check whether or not they are anagrams. Actually we can check if they have same number of characters before sort them, so this is the solution v0:

   static void countAnagrams(String[] arr) {

       ArrayList<ArrayList<Integer>> idxLists = new ArrayList<ArrayList<Integer>>();

       if (arr.length == 0) {

           return;

       }

       for (int i = 0; i < arr.length; i++) {

           String[] abStr = arr[i].split(” “);

           String aStr = abStr[0];

           String bStr = abStr[1];

           String sortedBStr = StrSort(bStr);

           int lenA = aStr.length();

           int lenB = bStr.length();

           ArrayList<Integer> idxList = new ArrayList<Integer>();

           for (int j = 0; j < lenA – lenB + 1; j++) {

               String curStr = aStr.substring(j, j+lenB);

               if (checkCharNum(curStr, sortedBStr)) {

                   String sortedAStr = StrSort(curStr);

                   if (sortedBStr.equals(sortedAStr)) {

                       idxList.add(j);

                   }                    

               }

           }

           idxLists.add(idxList);

       }

       for (int i = 0; i < idxLists.size(); i++) {

           ArrayList<Integer> idxList = idxLists.get(i);

           System.out.print(idxList.size() + ” “);

           for (int j = 0; j < idxList.size(); j++) {

               System.out.print(idxList.get(j) + ” “);

           }

           System.out.print(“\n”);

       }

   }

 

static String StrSort(String s) {

  char[] c = s.toCharArray();

  Arrays.sort(c);

  return new String(c);

}

static Boolean checkCharNum(String a, String b) {

       int[] aChars = new int[256];

       int[] bChars = new int[256];

       for (int i = 0; i < a.length(); i++) {

           aChars[a.charAt(i)]++;

           bChars[b.charAt(i)]++;

       }

       for (int i = 0; i < 256; i++) {

           if (aChars[i] != bChars[i]) {

               return false;

           }

       }

       return true;

   }

But do we have to sort the two strings if they have the same number of all characters? The answer is definitely no. Moreover, it’s better to check the length before check character number and it will take less time since most sorting algorithm still takes O(nlogn) time, in that case, we can remove sorting code and have solution v1:

   static void countAnagrams(String[] arr) {

       ArrayList<ArrayList<Integer>> idxLists = new ArrayList<ArrayList<Integer>>();

       if (arr.length == 0) {

           return;

       }

       for (int i = 0; i < arr.length; i++) {

           String[] abStr = arr[i].split(” “);

           String aStr = abStr[0];

           String bStr = abStr[1];

           int lenA = aStr.length();

           int lenB = bStr.length();

           if (lenA < lenB) {

               continue;

           }

           ArrayList<Integer> idxList = new ArrayList<Integer>();

           for (int j = 0; j < lenA – lenB + 1; j++) {

               String curStr = aStr.substring(j, j+lenB);

               if (curStr.length() == lenB && checkCharNum(curStr, bStr)) {

                   idxList.add(j);

               }

           }

           idxLists.add(idxList);

}

 

       for (int i = 0; i < idxLists.size(); i++) {

           ArrayList<Integer> idxList = idxLists.get(i);

           System.out.print(idxList.size() + ” “);

           for (int j = 0; j < idxList.size(); j++) {

               System.out.print(idxList.get(j) + ” “);

           }

           System.out.print(“\n”);

       }

   }

 

   static Boolean checkCharNum(String a, String b) {

       int[] aChars = new int[256];

       int[] bChars = new int[256];

       for (int i = 0; i < a.length(); i++) {

           aChars[a.charAt(i)]++;

           bChars[b.charAt(i)]++;

       }

       for (int i = 0; i < 256; i++) {

           if (aChars[i] != bChars[i]) {

               return false;

           }

       }

       return true;

   }

There are some big size test cases still cannot pass due to long time running which pushes us to keep improving the algorithm  and shorten the execution time. After analysis, it takes tons of time to check character number for two strings in each round iteration, but wait, do we really need to calculate every time? Absolutely no, because we already check it in last round, so we can make fully use of previous result and just check the new character in the current string and remove old character one. Solution v2 is like this:

static void countAnagrams(String[] arr) {

       ArrayList<ArrayList<Integer>> idxLists = new ArrayList<ArrayList<Integer>>();

       if (arr.length == 0) {

           return;

       }

       for (int i = 0; i < arr.length; i++) {

           String[] abStr = arr[i].split(” “);

           String aStr = abStr[0];

           String bStr = abStr[1];

           int lenA = aStr.length();

           int lenB = bStr.length();

           if (lenA < lenB) {

               continue;

           }

           ArrayList<Integer> idxList = new ArrayList<Integer>();

           int[] bChars = new int[256];

           for (int m = 0; m < b.length(); m++) {

              bChars[b.charAt(m)]++;

           }

           for (int j = 0; j < lenA – lenB + 1; j++) {

               String curStr = aStr.substring(j, j+lenB);

               int[] aChars = new int[256];

               if (j == 0) {

                   for (int m = 0; m < curStr.length(); m++) {

                      aChars[curStr.charAt(m)]++;

                   }

               } else {

                   aChars[curStr.charAt(j-1)]–;

                   aChars[curStr.charAt(j)]++;

               }

               int n = 0;

               for (n = 0; n < 256; n++) {

                   if (aChars[n] != bChars[n]) {

                       return break;

                   }

               }

               

               if (n == 256) {

                   idxList.add(j);

               }

           }

           idxLists.add(idxList);

}

 

       for (int i = 0; i < idxLists.size(); i++) {

           ArrayList<Integer> idxList = idxLists.get(i);

           System.out.print(idxList.size() + ” “);

           for (int j = 0; j < idxList.size(); j++) {

               System.out.print(idxList.get(j) + ” “);

           }

           System.out.print(“\n”);

       }

   }

 

   static Boolean checkCharNum(String a, String b) {

       int[] aChars = new int[256];

    int[] bChars = new int[256];

       for (int i = 0; i < a.length(); i++) {

           aChars[a.charAt(i)]++;

           bChars[b.charAt(i)]++;

       }

       for (int i = 0; i < 256; i++) {

           if (aChars[i] != bChars[i]) {

               return false;

           }

       }

       return true;

   }

Finally, we got the optimized answer by improving it again and again, that’s a process to make the program clean and simple. Like KISS principle in UNIX philosophy, in the process to simplify the algorithm and we dry the code and find a better solution for the problem. If you have time, don’t forget to refactor code and reflect on the abstraction, you will benefit from this habit.

 

Advertisements

About liyao13

Yao Li is a web and iOS developer, blogger and he has a passion for technology and business. In his blogs, he shares code snippets, tutorials, resources and notes to help people develop their skills. Donate $5 to him for a coffee with PayPal at About Me page and read more professional and interesting technical blog articles. Follow him @Yaoli0615 at Twitter to get latest tech updates.
This entry was posted in CS Research&Application, Uncategorized and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s