Wednesday, March 16, 2011

Extract HTML from Android webview

Having a webview in android and wanting to fiddle with the HTML?

Injecting javascript in android webview

Its easy, make sure you have javascript enabled and simply load url containing javascript:
mBrowser = new WebView(context);
mBrowser.getSettings().setJavaScriptEnabled(true);
mBrowser.loadUrl("javascript:getElementById('ad_container').style.display='none');");

In most cases, you first want to load the actual web page, then inject the javascript - this is done by using a WebViewClient which overrides onPageFinished.

mBrowser.setWebViewClient(new WebViewClient() {
   @Override
   public void onPageFinished(WebView view, String url) {
      // check url
      if (SOME_WEBPAGE.equals(url) {
        loadUrl(JAVASCRIPT_CODE);
      }
   }
});
mBrowser.loadUrl(SOME_WEBPAGE);

Extracting HTML from Android webview

Taking it one step further - let's get HTML from a page and use it in our program. This is accomplished by using a JavascriptInterface with our webview.

mBrowser.getSettings().setJavaScriptEnabled();
// add javascript interface with example method
mBrowser.addJavaScriptInterface(new JavaScriptInterface() {
   public void setTitle(String html) {
      mTitle = html;
   }
}, "MY_JS");
// add webview client that calls javascript interface when page is loaded
mBrowser.addWebViewClient(new WebViewClient() {
   @Override
   public void onPageFinished(WebView view, String url) {
       view.loadUrl("javascript:window.MY_JS.setTitle(document.getElementsByTagName('title')[0].innerHTML);");
   }

mBrowser.loadUrl(SOME_WEBPAGE);

That's it - using the combination of WebVievClient and JavascriptInterface is very powerful - drawback is that I have not found an easy way of debugging javascript - if your javascript fails it will fail silent.

Nice thing is you can test your javascript on Firefox using Firebug on your main computer before adding code to your android project.

13 comments:

  1. If you just want to download the HTML from an URL it feels like there are easier ways to do that, but that isn't your main goal I guess?
    Otherwise I usually use HttpGet but then you can't navigate and then just get the HTML...

    ReplyDelete
  2. If you don't need/want all the features from webview but just want the http response you may want to consider using AndroidHttpClient (API 8 [2.2-Froyo]) or DefaultHttpClient (API 1).

    ReplyDelete
  3. True, you do have some extra features in the webview. I rest my case :)

    ReplyDelete
  4. Don't rest it just yet =) It depends on what you need. A lightweight HttpClient will be a much faster option if you just want to parse some html contents. The webview will load css, javascript (unless 2.2 - blockNetworkLoad) etc. increasing network load. But in my case, for TravAlert, having the webview manage authentication and session cookies makes it all worth it.

    ReplyDelete
  5. The guys above does not undsrstand one thing it is damn easy to fetch content of some remote address. The guide here is essential when you use WebView in your application and want to debug the received HTML page, when at the same time the request should be made via VewbView and not with some AndroidHttpClient, as we need exact browser headers and proxies to be used.

    ReplyDelete
  6. [quote]drawback is that I have not found an easy way of debugging javascript[/quote]

    Add a WebChromeClient to your WebView and override the WebChromeClient's onConsoleMessage method.

    You can re-direct all WebView console output to the android logcat.

    Easy example here:

    http://developer.android.com/guide/webapps/debugging.html

    warwound

    ReplyDelete
  7. Yes, but still - console output is a very limited and time consuming way of debugging.

    ReplyDelete
  8. while paste the above code in my program,there are so many errors,so please give this code as a project or give entire code.

    please,its urgent

    ReplyDelete
  9. can you give a working example
    I know the above concept work fine.But i cannot realize how to implement it.Beginners cannot follow the above.So can you please give a working example.

    Thank You Very Much

    KIRAN

    ReplyDelete
  10. This comment has been removed by a blog administrator.

    ReplyDelete
  11. Thanks man, just what I was looking for! :)

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. could you please give a working example.i am a beginner.i tried many examples.

      Delete