Sunday, 25 March 2012

Android OCR tutorial - image to text

This tutorial will show how to use and implement OCR library (tesseract) in android application. Tesseract is open source library for OCR originally developed by HP.

1. Download tesseract library for android https://github.com/rmtheis/tess-two/downloads. Download as .zip for
    windows, as .tar.gz for linux user.

2. Software requirement
    - Eclipse
    - Java JDK
    - Android SDK
    - Android NDK
    - Cygwin ( for windows users)
    - Apache-ant
3. For windows user, make sure you already installed cygwin ( you can download it  and install it from http://www.cygwin.com/ make sure during the cygwin installation, install also these source and library gcc-core, gcc-g++, make, swig)

4. Download apache-ant from http://ant.apache.org/bindownload.cgi choose .zip for windows, .tar.bz for linux user.

5. Unzip the apache and set the environment variable (mine is C:\apache-ant-1.8.3\bin)


6. Run cygwin (for windows user only,for linux user,run terminal)
     a.cd <project-directory>/tess-two
     b.export TESSERACT_PATH=${PWD}/external/tesseract-3.01
     c.export  LEPTONICA_PATH=${PWD}/external/leptonica-1.68
     d.export LIBJPEG_PATH=${PWD}/external/libjpeg
     e.ndk-build(for windows user, /cygdrive/<ndk-directory>/ndk-build)
     f. android update project --path . (for windows user, sometime cygwin cannot execute this command, so
        use command prompt to execute this command).
        Note: The “.” after --path must be included in the command.
     g. ant release ( sometimes you will get error like java tools.jar not found, set environment variable
         JAVA_HOME to the jdk folder, mine is C:\Program Files\Java\jdk1.7.0)

7. Run Eclipse. Right click on package explorer, import>> General >> Existing Project into Workspace >> 
    Next >> Select Root Directory >> Browse the tess-two folder location >> Finish.
    You will see tess-two folder in your package explorer.
    
8. Right click on the project >> Android Tools >> Fix Project Properties. Right click >> Properties >> 
    Android >> Check Is Library. 
    Download the simple OCR android app from https://github.com/GautamGupta/Simple-Android-OCR.
    Right click on package explorer, import the simple OCR android app folder.

9. Right click on the project >> Android >> Add >> click tess-two >> OK
   
10. Run the app. Good luck



References
[1] http://gaut.am/making-an-ocr-android-app-using-tesseract/
[2] http://ant.apache.org/bindownload.cgi
[3] http://wolfpaulus.com/journal/android-and-ocr
[4] http://rmtheis.wordpress.com/2011/08/06/using-tesseract-tools-for-android-to-create-a-basic-ocr-app/
[5] https://github.com/rmtheis/tess-two


81 comments:

  1. you can download tess-two library by clicking tot this link
    http://www.4shared.com/folder/SinVRg1O/_online.html

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. unzip the one http://www.4shared.com/folder/SinVRg1O/_online.html

    -using msdos, cd to the extracted tess-two folder then type "/android-sdk-windows/tools/android update project --path . "
    -create new Environment (computer -> properties->advance) for JAVA_HOME, value: C:\Program Files\Java\jdk1.7.0, ANT_HOME value: C:\apache-ant-1.8.4, PATH, value:;C:\Program Files\Java\jdk1.7.0\bin; C:\apache-ant-1.8.4\bin
    -then on cygwin, cd the tess-two then type " ant release "
    -sometime the environment does not update without restarting ur pc
    -then follow the upper example starting from no 7.

    ReplyDelete
  6. usually android-sdk-windows folder in C, so, after cd to tess-two folder, if using command prompt,type "C:\android-sdk-windows\tools\android update project --path ." or "/cygdrive/c/android-sdk-windows/tools/android update project --path ." if using cygwin. you can try either one.

    ReplyDelete
  7. how can i use other language?

    ReplyDelete
  8. I haven't try other language yet.. but this link might be useful for you http://vannait.blogspot.com/2009/06/how-to-train-tesseract-ocr.html

    ReplyDelete
  9. can you make video for this tutorial sorry but i'm beginner with android

    ReplyDelete
  10. Later we will provide the video for you

    ___________________________________________________________________
    Wanna go job online? Learn how to make money by doing online job as survey taker..

    http://mellore107.daywealth.hop.clickbank.net/

    ReplyDelete
  11. i'm widnows use and i'm getting this error at step 6,e

    E:/adt-bundle-windows-x86_64/android-ndk-r8d/ndk-build
    SharedLibrary : liblept.so
    E:/adt-bundle-windows-x86_64/android-ndk-r8d/toolchains/arm-linux-androideabi-4.6/prebuilt/windows/bin/../lib/gcc/arm-linux-androideabi/4.6/../../../../arm-linux-androideabi/bin/ld.exe: error: cannot open ./obj/local/armeabi/libgnustl_static.a: Permission denied
    collect2: ld returned 1 exit status
    /cygdrive/e/adt-bundle-windows-x86_64/android-ndk-r8d/build/core/build-binary.mk:397: recipe for target `obj/local/armeabi/liblept.so' failed
    make: *** [obj/local/armeabi/liblept.so] Error 1

    ReplyDelete
  12. Hi Firas, here I found some post about permission denied at libgnustl_static.a. Check this http://stackoverflow.com/questions/12469711/building-tesseract-with-android-ndk

    ReplyDelete
  13. would you tell me whats this command for ??
    chmod 777 obj/local/armeabi/libgnustl_static.a

    I don't know how to thank you for your thread and you quick response :-)

    ReplyDelete
  14. This comment has been removed by the author.

    ReplyDelete
  15. This comment has been removed by the author.

    ReplyDelete
  16. Hi firas, chmod =sets permissions for a file eg: read, write, execute for the owner (you) and guests (any one who views the file...this includes programs). 777 means anyone can do anything with the file. If the suggestion does not work, try chmod -R 777 /cygdrive/(eclipse workspace folder)

    I found it at http://stackoverflow.com/questions/11551742/ndk-build-error-with-cygwin

    ReplyDelete
  17. building tesseract library with ndk a bit tiring, you can just download the already built library at http://www.4shared.com/folder/SinVRg1O/_online.html then follow the suggestion did by wany (see the earlier comments).

    ReplyDelete
  18. thank you very much, I did it with chmod -R 777 /cygdrive/(eclipse workspace folder) and chmod -R 777 file without permission.

    now my engine work perfectly,
    do you know how to add continuous mode or crop the image to let the user choose the text from the image ??

    ReplyDelete
  19. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
  20. i dont know how do the step 6 i'm not aware of cygwin could u please help me

    ReplyDelete
  21. Hi Rex,
    If you use windows, you should install cygwin to do step 6.
    Then run cygwin and just write the command as in step 6 for example, for 6 a,cd /tess-two, in my case, my workspace is in D:\workspace\opencv\tess-two , so the command that i should write will be cd /cygdrive/d/workspace/opencv/tess-two . Hope this can help

    ReplyDelete
  22. i dont know why but when my application runs i run the camera take the picture and when i save it my application crashes saying that the application stopped unexpectedly! any reasons in mind why??

    ReplyDelete
  23. and oh yea when i ran the logcat on eclipse i could see something like could not find class 'com.googlecode.tesseract.android.TessBaseAPI' referenced from method com.datumdroid.android.ocr.simple.SimpleAndroidOCRActicity.picturetaken'. Just to help you know the problem specifics! :)

    ReplyDelete
  24. Why sometimes the recognized text is a bunch of random characters? I get this result even if there is no character in the picture.

    ReplyDelete
  25. The OCR tutorial in this blog is quite simple and sometimes unstable. Follow this tutorial for more stable http://gaut.am/making-an-ocr-android-app-using-tesseract/

    ReplyDelete
    Replies
    1. Thank you! But what is the difference between these two tutorials?

      Delete
  26. This comment has been removed by the author.

    ReplyDelete
  27. can you send us video tutorial? i have done all the steps but the ocr app shows 'force close' message

    ReplyDelete
  28. Please make a video on you tube !

    ReplyDelete
  29. Yup it is working for me... thanks kurup ;)

    ReplyDelete
  30. i have and error when building the ndk, this is the error: jni/../external/libjpeg/jidctfst.S: Assembler messages:
    jni/../external/libjpeg/jidctfst.S:66: Error: missing ')'
    jni/../external/libjpeg/jidctfst.S:66: Error: garbage following instruction -- ` pld (r2,#0)'
    jni/../external/libjpeg/jidctfst.S:259: Error: missing ')'
    jni/../external/libjpeg/jidctfst.S:259: Error: garbage following instruction -- `pld (sp,#32)'
    jni/../external/libjpeg/jidctfst.S:271: Error: missing ')'
    jni/../external/libjpeg/jidctfst.S:271: Error: garbage following instruction -- `pld (ip,#32)'
    /cygdrive/d/Programming/Programs/android/android-ndk-r8d/build/core/build-binary .mk:267: recipe for target `obj/local/armeabi/objs/jpeg/jidctfst.o' failed
    make: *** [obj/local/armeabi/objs/jpeg/jidctfst.o] Error 1


    i already checked this file and didn't find that missing ')'

    can any one help ?

    ReplyDelete
  31. i'm windows use,too
    and i'm getting this error at step 6,f

    cygwin cannot execute command "android update projrct --path ."
    here's what i got [-bash: android:Command can not find]

    so I use command prompt to execute same command above, and I got " 'android' is not internal or external command, operable program or batch file"

    did i make any mistakes about commands?

    ReplyDelete
    Replies
    1. Oh! Finally! I got it!!
      but now I have a new error to be solved..
      after I execute "android update projrct --path ."
      I got
      Error: Expected verb after global parameters but found 'projrct' instead.

      Delete
    2. I also not sure about this.. but maybe because you misspell "project" with "projrct" in the command.. maybe..

      Delete
    3. I cant believe I make such mistake...
      THANNNNK you!!!!
      still a little question : why do I need to download Apache-ant, what is it for?

      Delete
    4. How did you correct this error ... im stuck in this phase .. pls help

      Delete
  32. $ /cygdrive/c/adt-bundle-windows-x86_64-20130219/adt-bundle-windows-x86_64-20130219/sdk/tools/android.bat update project --path .
    Error: The project either has no target set or the target is invalid.
    Please provide a --target to the 'android.bat update' command.

    I am not sure hot to set target

    any help

    ReplyDelete
    Replies
    1. /cygdrive/c/adt-bundle-windows-x86_64-20130219/adt-bundle-windows-x86_64-20130219/sdk/tools/android.bat update project -p . --target android-17

      worked

      Delete
    2. I am getting this problem.. can u plss help me out??
      /cygdrive/f/adt-bundle-windows-x86_64-20130522/sdk/tools
      $ android.bat update project --p. --target android-10
      -bash: android.bat: command not found

      Delete
  33. hi .. i am stuck on the 6th step.. in the ndk-build part i am getting the following error



    Manish@Kadaba-PC /cygdrive/c/Users/Manish/Downloads/tess-two-master
    $ /cygdrive/C:\android-ndk-r8d/ndk-build
    -bash: /cygdrive/C:android-ndk-r8d/ndk-build: No such file or directory

    Manish@Kadaba-PC /cygdrive/c/Users/Manish/Downloads/tess-two-master
    $ /cygdrive/C:/android-ndk-r8d/ndk-build
    -bash: /cygdrive/C:/android-ndk-r8d/ndk-build: No such file or directory



    someone please help me out here...thank you

    ReplyDelete
  34. Hi Manish.
    Try this:-
    Instead using /cygdrive/C:\android-ndk-r8d/ndk-build, try this way
    /cygdrive/c/android-ndk-r8d/ndk-build
    hope it will works :)

    ReplyDelete
    Replies
    1. hey kurup one question
      will it recognise power functions and other math expressions like +,-,= etc ??

      Delete
  35. =============================================
    Nicky Valiant@Nicx-7 /cygdrive/c/android/android-ndk-r8e
    $ cd /cygdrive/c/android/android-ndk-r8e/

    Nicky Valiant@Nicx-7 /cygdrive/c/android/android-ndk-r8e
    $ ndk-build
    -bash: ndk-build: command not found
    ====================OR==========================
    Nicky Valiant@Nicx-7 /cygdrive/c/android/android-ndk-r8e
    $ cd /cygdrive/c/android/android-ndk-r8e/ndk-build
    -bash: cd: /cygdrive/c/android/android-ndk-r8e/ndk-build: Not a directory
    ==============================================
    my problem is same as Manish...at 6,e....

    ReplyDelete
  36. I am downloading NDK build now, heavy download, meanwhile could I get an apk of this project? Not the source, the apk file. :)

    ReplyDelete
  37. after scanning with ocr, i want to save the ocr text into a .txt or .png file for other purpose. how to do that?

    ReplyDelete
  38. Thanks for the tutorial. Anyway, why do you need cygwin to build the project? In the official site of tess-two, they only need Android NDK.

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. Ok. Finally, I can build it (Anyway, I did not use the cygwin terminal). So, which part I can use to import to my project? Do i need to include the whole project?

      Delete
  39. Thanks a lot, it is working, but does not recognize the characters, my intention is to read characters in a lottery ticket, how can I improve it?

    ReplyDelete
  40. While importing the project to workspace it shows ERROR:resource directory'S:\tess-two-master\tess-two\res' does not exist.. what should I do to solve this issue? ?

    ReplyDelete
  41. can you share to me your whole project ? :D

    ReplyDelete
  42. do you have the video, cause its hard for me to follow as a beginner, thanks~

    ReplyDelete
    Replies
    1. http://androidtesstwo.blogspot.com/2014/03/making-android-ocr-app-using-tess-two.html?spref=fb

      no need for ndk, ant and cygwin. just need to import it eclipse.

      Delete
  43. Hello, i made an installation beginners guide for Android NDK and Tesseract Library on Windows 7 (64bit). I am a beginner too,so I assume similar steps on 32-bit..

    I uploaded on my github:
    https://github.com/GiorgosPap/androidndk-tesslib

    I hope finding it useful.

    ReplyDelete
  44. hey please tell me any libray or api to use convert image to text for phonegap or sencha touch.......
    because i am using cross platform for mobile development

    ReplyDelete
  45. Hi, I have done up to Step 6f and it works. But when come to Step 6 g. Ant release, it shows
    " 'ant' is not recognized as an internal or external command, operable program or batch file. "
    So, what should I do in order to continue this?
    Please help. Thanks.

    ReplyDelete
  46. how many fonts does it detect without additional training?

    ReplyDelete
  47. hey can u please tell i directly need to run this command where i executed ndk-build.. or change the direcotry path to execute this command "android update project --path . " ?
    plss reply need urgent help.. i want to know how to proceed to step 6f..

    ReplyDelete
  48. nice tutorial helped me a lot

    ReplyDelete
  49. Replies
    1. http://androidtesstwo.blogspot.com/ check an easy solution here. You don’t need ndk, ant, setting environment variables

      Delete
    2. Android Ocr tess-two library Without ndk-build: Making Android OCR app using tess-two v3.03..

      http://androidtesstwo.blogspot.com/

      Delete
  50. What is the required Andriod NDK version for OCR ??

    ReplyDelete
    Replies
    1. http://androidtesstwo.blogspot.com/ check an easy solution here.

      Delete
  51. Ultimate solution. See my blog here http://androidtesstwo.blogspot.com/

    you will not need ant, ndk, setting environment variables. very easy way.

    ReplyDelete
    Replies
    1. Francis Solomon, you make it very easy and simple. Thanks for your effort. But the result is completely different from the taken image. Is there a way to accurate it.

      Delete
  52. good day. I would just like to ask if the simple OCR android app that you have provided here can read any type of fonts like : times new roman, calibri, comic sans, etc.? I was able to run the application but sometimes the results are not accurate. Letter "B" sometimes become letter "O". If it does read any type of font, I am thinking that I may have gone wrong at some points or I might be lacking something to make it work perfectly. I am new to android programming but I am really interested to learn.. please help me.

    ReplyDelete
    Replies
    1. if u are using the eng.traindata it gives only i can say 70% result.
      But if u want accurate result you need to train tesseract for the fonts you want your application to use.

      Delete
  53. This comment has been removed by the author.

    ReplyDelete
  54. Francis Solomon, you make it very easy and simple. Thanks for your effort. But the result is completely different from the taken image. Is there a way to accurate it.

    ReplyDelete
    Replies
    1. you can try training tesseract for the fonts you want your app to read.

      Delete
  55. This comment has been removed by the author.

    ReplyDelete
  56. Is it possible to implement for multiple language in single program...?

    ReplyDelete
  57. UNDERDOG@LenovoG40 /cygdrive/c/Android/tess-two
    $ /cygdrive/c/Android/android-ndk-r10d/ndk-build
    Android NDK: Could not find application project directory !
    Android NDK: Please define the NDK_PROJECT_PATH variable to point to it.
    /cygdrive/c/Android/android-ndk-r10d/build/core/build-local.mk:148: *** Android NDK: Aborting . Stop.

    can u help me?

    ReplyDelete
  58. This comment has been removed by the author.

    ReplyDelete

  59. Description Resource Path Location Type
    The import java.io.FileInputStream is never used SimpleAndroidOCRActivity.java /Simple-Android-OCR/src/com/datumdroid/android/ocr/simple line 4 Java Problem

    and next is:-

    Description Resource Path Location Type
    The import java.util.zip.GZIPInputStream is never used SimpleAndroidOCRActivity.java /Simple-Android-OCR/src/com/datumdroid/android/ocr/simple line 9 Java Problem


    please help AS SOON AS POSSIBLE

    ReplyDelete
  60. Hello, an amazing Information dude. Thanks for sharing this nice information with us. OCR and scanner pls

    ReplyDelete