Web page interoperability with .NET browser controls

Everyone knows that the best way to manipulate or extract data from a hypertext page is to use JavaScript. The langue is built entirely to serve that purpose so it’s natural to think about it every time someone tell you to get an element’s text or skip unnecessary screens. Firefox’s XUL even relies on the language to control the entire application’s behavior.

That method does well most of the time but it requires that you have access to the JavaScript source and can modify the page, which is not the case in some situation:

  1. Rapidshare makes you wait for two minutes or so when you want to get some file from them for free. It’s better than before when you have to go past some feline CAPTCHA but you still have to click on the download button after the timer expire, which means you can’t click on a Rapidshare link and leave it to download itself. Will Rapidshare allow you to edit the download page to insert the JavaScript that will click the button for you? No!
  2. You want to capture stock value from your finance information agency’s web site and save it into an Excel file for later reference, what should you do? Ask Yahoo Finance to modify its web page to send some fancy XMLRequest to your server every time it updates? And even if they do so you’ll have to make a listening server for those request on your side.

Of course there are other solution to those problem other than coding it on your own. There’s an extension for Firefox called Skipscreen that will automatically download the Rapidshare file for you or you con hire a better information provider that have built-in analysis functions on their site for you.

But if you are a control freak like me and want to make sure everything work exactly as what it is intended for then crawling through thousands of line of Skipscreen’s source code can be a bit of a pain (mind you that JavaScript is neither a strong typed or object oriented language so have fun visualizing variables and objects in your head). About the stock, can you be sure that your provider doesn’t tail gate your order? (e.g. You decide to buy some interesting stock and order them to do so, they find that stock interesting too and buy the same stock before actually place your order so they get the better price)

With Internet Explorer (e.g. the default browser control)

It’s simple actually, just create a browser control, use it’s Navigate(url) method for the page you want. For demonstration, I’ll try to get time from the text box on this page.

image

Here’s the code:

            HtmlElement IEElement1 = webBrowser1.Document.GetElementById("clockspot");
            if (IEElement1 != null)
            {
                textBox1.Text += "ie text 1: " + IEElement1.GetAttribute("value") + "rn";
            }
            HtmlElementCollection IEElements = webBrowser1.Document.GetElementsByTagName("input");
            if (IEElements.Count > 0)
            {
                foreach (HtmlElement IEElement in IEElements)
                {
                    if (IEElement.Name == "clockspot")
                        textBox1.Text += "ie text 2: " + IEElements[0].GetAttribute("value") + "rn";
                }
            }

This shows two ways you con search for a text box on the page. The first is by element ID. Actually, the textbox only have the name attribute but since this is IE and we have hideous rule all around, it will return the text box you want. The second way is to get all <input> elements on the Pago and check their name attribute.

Once you have found the textbox, get its “value” attribute. This value is updated in real time (the clock on the page is updated with JavaScript dynamically)

image

Side note: Running browser control in standard compliance mode

The first time you test the C# browser control  when you have IE8 installed may make you disappointed – it does not pass. This is because by default, the control run in quirks mode (this can break a lot of stuff). To force it to render the page in standard mode you’ll have to edit the registry, add this value:

[HKEY_CURRENT_USERSoftwareMicrosoftInternet ExplorerMainFeatureControlFEATURE_NATIVE_DOCUMENT_MODE]
"MyApplication.exe"=dword:13880

Replace MyApplication with your application’s executable name. Also if you are debugging in Visual Studio, add MyApplication.vshost.exe too.

With Firefox… or more exactly, XUL runner

XUL runner is the engine behind Firefox, it renders everything from web pages to Firefox itself. Being a cross platform platform (confusing isn’t it?), it will provide you with more cross-platform flexibility and may even save you from having to recompile the code. The is – if Mono is going somewhere.

XUL runner doesn’t work with C# out of the box, you’ll have to download Skybound’s  component – GeckoFX for that. This will give you a nice drag-and-drop component on the toolbox, but you are not done yet!

The basic component only provide basic functions like navigate, get/set for element text. To work with values contained in specific types of element you’ll need an experimental feature – DOM manipulation, which can be downloaded from here.

Add it to the original GeckoFX project, recompile it with the appropriate XUL version (1.8 or 1.9 or 1.9.1) and then you can use the code below:

            GeckoElementCollection FFElements = geckoWebBrowser1.Document.GetElementsByName("clockspot");
            if (FFElements.Count > 0)
            {
                GeckoElement Element = FFElements[0];
                GeckoInputElement Input = new GeckoInputElement(Element.DomObject);
                textBox1.Text += "ff text: " + Input.Value + "rn";
            }

The first step is similar to what you do with the IE control: search for the element. When it’s found, convert it to an input element (GeckoInputElement), then you can get its value. This is also updated in real time.

Conclusion

Due to time constraints, this post can only cover the very basic. You can explore other feature of the controls yourself – try replacing the “get” with “set” for example. Applications is endless. You can now make Rapidshare furious, dominate the stock market or use the internet itself for your data mining project!

Web application testing framework

This is an update section and is added on 10/07/2010

If you happened to use the .net Framework and wanted to test your web application’s behaviour in browsers, you have WaitN. Perhaps a bit of code from its first page will best illustrate what it does

[Test] 
public void SearchForWatiNOnGoogle()
{
 using (var browser = new IE("http://www.google.com"))
 {
  browser.TextField(Find.ByName("q")).TypeText("WatiN");
  browser.Button(Find.ByName("btnG")).Click();
  
  Assert.IsTrue(browser.ContainsText("WatiN"));
 }
}

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.