Web page interoperability with .NET browser controls

Everyone knows that the best way to manipulate or extract data from a hypertext page is to use JavaScript. The langue is built entirely to serve that purpose so it’s natural to think about it every time someone tell you to get an element’s text or skip unnecessary screens. Firefox’s XUL even relies on the language to control the entire application’s behavior.

That method does well most of the time but it requires that you have access to the JavaScript source and can modify the page, which is not the case in some situation:

  1. Rapidshare makes you wait for two minutes or so when you want to get some file from them for free. It’s better than before when you have to go past some feline CAPTCHA but you still have to click on the download button after the timer expire, which means you can’t click on a Rapidshare link and leave it to download itself. Will Rapidshare allow you to edit the download page to insert the JavaScript that will click the button for you? No!
  2. You want to capture stock value from your finance information agency’s web site and save it into an Excel file for later reference, what should you do? Ask Yahoo Finance to modify its web page to send some fancy XMLRequest to your server every time it updates? And even if they do so you’ll have to make a listening server for those request on your side.

Of course there are other solution to those problem other than coding it on your own. There’s an extension for Firefox called Skipscreen that will automatically download the Rapidshare file for you or you con hire a better information provider that have built-in analysis functions on their site for you.

But if you are a control freak like me and want to make sure everything work exactly as what it is intended for then crawling through thousands of line of Skipscreen’s source code can be a bit of a pain (mind you that JavaScript is neither a strong typed or object oriented language so have fun visualizing variables and objects in your head). About the stock, can you be sure that your provider doesn’t tail gate your order? (e.g. You decide to buy some interesting stock and order them to do so, they find that stock interesting too and buy the same stock before actually place your order so they get the better price)

With Internet Explorer (e.g. the default browser control)

It’s simple actually, just create a browser control, use it’s Navigate(url) method for the page you want. For demonstration, I’ll try to get time from the text box on this page.

Here’s the code:

            HtmlElement IEElement1 = webBrowser1.Document.GetElementById("clockspot");
            if (IEElement1 != null)
            {
                textBox1.Text += "ie text 1: " + IEElement1.GetAttribute("value") + "rn";
            }
            HtmlElementCollection IEElements = webBrowser1.Document.GetElementsByTagName("input");
            if (IEElements.Count > 0)
            {
                foreach (HtmlElement IEElement in IEElements)
                {
                    if (IEElement.Name == "clockspot")
                        textBox1.Text += "ie text 2: " + IEElements[0].GetAttribute("value") + "rn";
                }
            }

This shows two ways you con search for a text box on the page. The first is by element ID. Actually, the textbox only have the name attribute but since this is IE and we have hideous rule all around, it will return the text box you want. The second way is to get all <input> elements on the Pago and check their name attribute.

Once you have found the textbox, get its “value” attribute. This value is updated in real time (the clock on the page is updated with JavaScript dynamically)

Side note: Running browser control in standard compliance mode

The first time you test the C# browser control  when you have IE8 installed may make you disappointed – it does not pass. This is because by default, the control run in quirks mode (this can break a lot of stuff). To force it to render the page in standard mode you’ll have to edit the registry, add this value:

[HKEY_CURRENT_USERSoftwareMicrosoftInternet ExplorerMainFeatureControlFEATURE_NATIVE_DOCUMENT_MODE]
"MyApplication.exe"=dword:13880

Replace MyApplication with your application’s executable name. Also if you are debugging in Visual Studio, add MyApplication.vshost.exe too.

With Firefox… or more exactly, XUL runner

XUL runner is the engine behind Firefox, it renders everything from web pages to Firefox itself. Being a cross platform platform (confusing isn’t it?), it will provide you with more cross-platform flexibility and may even save you from having to recompile the code. The is – if Mono is going somewhere.

XUL runner doesn’t work with C# out of the box, you’ll have to download Skybound’s  component – GeckoFX for that. This will give you a nice drag-and-drop component on the toolbox, but you are not done yet!

The basic component only provide basic functions like navigate, get/set for element text. To work with values contained in specific types of element you’ll need an experimental feature – DOM manipulation, which can be downloaded from here.

Add it to the original GeckoFX project, recompile it with the appropriate XUL version (1.8 or 1.9 or 1.9.1) and then you can use the code below:

            GeckoElementCollection FFElements = geckoWebBrowser1.Document.GetElementsByName("clockspot");
            if (FFElements.Count > 0)
            {
                GeckoElement Element = FFElements[0];
                GeckoInputElement Input = new GeckoInputElement(Element.DomObject);
                textBox1.Text += "ff text: " + Input.Value + "rn";
            }

The first step is similar to what you do with the IE control: search for the element. When it’s found, convert it to an input element (GeckoInputElement), then you can get its value. This is also updated in real time.

Conclusion

Due to time constraints, this post can only cover the very basic. You can explore other feature of the controls yourself – try replacing the “get” with “set” for example. Applications is endless. You can now make Rapidshare furious, dominate the stock market or use the internet itself for your data mining project!

Web application testing framework

This is an update section and is added on 10/07/2010

If you happened to use the .net Framework and wanted to test your web application’s behaviour in browsers, you have WaitN. Perhaps a bit of code from its first page will best illustrate what it does

[Test] 
public void SearchForWatiNOnGoogle()
{
 using (var browser = new IE("http://www.google.com"))
 {
  browser.TextField(Find.ByName("q")).TypeText("WatiN");
  browser.Button(Find.ByName("btnG")).Click();
  
  Assert.IsTrue(browser.ContainsText("WatiN"));
 }
}

Local browsing with CoolIris

UPDATE: You can simply use CoolIris 1.10 for the same purpsose

Full-Screen, 3D — Cooliris transforms your browser into a lightning fast, cinematic way to browse online photos and videos. Our “3D Wall” lets you fly through thousands of items in the blink of an eye on an infinitely expandable wall. To enjoy Cooliris on Google Images, Facebook, YouTube, Flickr, and hundreds of other sites, click the Cooliris icon that appears when you mouse over media on the supported site. Or enjoy Cooliris anytime by clicking the browser toolbar icon. See http://www.cooliris.com/product for details.

I was looking for an app that let me browse through my image gallery Apple style – PhotoFlip, but there seems to be none, unless I download Safari :-/. I don’t want a new browser right now, so I turned to the next best thing: CoolIris. Somehow I vividly recall this is a project started from Microsoft Research but I can’t find any trace of Microsoft there. Maybe the Cooliris team doesn’t like Microsoft in the end. 😛

Well, at least I expected it to be able to automatically detect links to picture on a page and then let me view them. It supported Flickr and Google Images off-screen loading, which is much more complicated than a list.

What a shame it doesn’t.

I’m not the only person looking for a “standalone CoolIris” to view my files, the CoolIris forum is filled with similar requests but the best response they can give is “wait” and an “CoolIris Lite” in which the effect is much simple and you seems to have to add files manually.

No problem, I can fix that.

I tried the “quick” option from the developer’s site but that turned out to be the more complicated option since I have to generate both the expression and the gallery for it to work, so I tried the full option: I created a C# application that will read files from the directory you want, write two files “media.htm” and “media.rss” to the output directory.

  • Media.htm contains one picture to activate the CoolIris icon on the toolbar, links to media.rss.
  • Media.rss is the feed that lists the files which will let CoolIris know the full list of images.

The core part of the application follows

            // Read jpg and jpeg from target directory
            string[] Files = Directory.GetFiles(textBox1.Text, "*.jp*");
            // Prepare the feed
            StreamWriter writer = new StreamWriter(textBox2.Text + "media.rss");
            writer.WriteLine("<?xml version="1.0" encoding="UTF-8" standalone="yes"?>");
            writer.WriteLine("<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss" xmlns:atom="http://www.w3.org/2005/Atom">");
            writer.WriteLine("<channel>");
            writer.WriteLine("<title>Generated photos</title>");
	        writer.WriteLine("<link>file:///" + textBox2.Text.Replace("\","/") + "</link>");
            writer.WriteLine("<description>Photos from my trip to Africa.</description>");
            int Counter = 1;
            foreach (string FileName in Files)
            {
                FileInfo Item = new FileInfo(FileName);
                writer.WriteLine("<item>");
                writer.WriteLine("		<title>" + Item.Name + "</title>");
                writer.WriteLine("		<link>" + "file:///" + FileName.Replace("\","/") + "</link>");
                writer.WriteLine("		<guid>img " + Counter.ToString() + "</guid>");
                writer.WriteLine("		<media:description>" + Item.Name + "</media:description>");
                writer.WriteLine("		<media:thumbnail url="file:///" + FileName.Replace("\","/") + "" />");
                writer.WriteLine("		<media:content url="file:///" + FileName.Replace("\","/") + "" type="image/jpeg" />");
                writer.WriteLine("	</item>");
                writer.WriteLine("");
            }
            writer.WriteLine("</channel>");
            writer.WriteLine("</rss>");
            writer.Close();
            //Prepare the HTML file
            writer = new StreamWriter(textBox2.Text + "media.htm");
            writer.WriteLine("<html>");
            writer.WriteLine("  <head>");
            writer.WriteLine("    <link rel="alternate" href="media.rss" type="application/rss+xml" title="" id="gallery" />");
            writer.WriteLine("  </head>");
            writer.WriteLine("  <body>");
            foreach (string FileName in Files)
            {
                FileInfo Item = new FileInfo(FileName);
                writer.WriteLine("<a href="" + "file:///" + FileName.Replace("\", "/") + "">");
                writer.WriteLine("<img alt="[Image]" src="" + "file:///" + FileName.Replace("\", "/") + "" class="photo">");
                writer.WriteLine("</a>");
                break;
            }
            writer.WriteLine("  </body>");
            writer.WriteLine("<html>");
            writer.Close();
            //Open the HTML file with the default browser, hopefully it's the one with CoolIris installed
            System.Diagnostics.Process.Start(textBox2.Text + "media.htm");

Application interface

Fill in the directory containing the pictures you want to view in the first text box, the second could be your temp directory. Click generate to generate the files mentioned above and open them in your default browser, click the CoolIris to activate CoolIris (this extra step is required because I don’t know how to call CoolIris on a file directly; the “Launch CoolIris” application seems to open Cooliris.com only).

I also made a command line interface with which you can call the app with 2 parameters. The first will go to “Directory” and the second to “Output”. If you run the app this way it will generate, open the file and then close itself.


Result

Download the application

Requirements:

  • .NET Framework 2.0
  • CoolIris enabled browser

Tumblelogs

Tumblelog is a term coined in 2005[1] to denote blogs which favor short posts, most of the time sharing only a single item. I encountered the first such blog on 2007[2] but it didn’t really impressed me. Not until last night, when I was shown a comic translation blog [3]. Tumblelogs are just great for the purpose! It allowed instant sharing of daily comic (one thing jumps right in my thoughts: there is a bookmarklet somewhere) without any comments, category or other hassles. On the other hand, perhaps the comic blog above is too simple it don’t tell us where the comics come from, nor mention Garfield’s author (so I’m not condoning it at all, just an example).

It’s also worth a note to distinguish between tumblelog and microblog. While microblog is primarily aimed at status updates, it also allows you to post multimedia content like tumblelog. But your status could change anytime, anywhere, so you’ll need an equivalent update convenience (e.g. mobile updates). On the other hand, tumblelogs are designed to share content, and as you may encounter most content while surfing, think bookmarklet and email-to-post. You can see the gap between two services is not big, and as this article is being written, more features are being stuffed to twitter (that’s why its interface has turned in to a big bunch of text instead of one line of status :P)

Okay, too much ranting again. I only wanted a short review about current tumblelog services when I started writing!

Tumblr

Not the first Tumblelog software, but the first one is currently down so I can’t test it :P.

Tumblr seems to be the most popular tumblelog service around. Clean interface, but there are only a few themes available. None looked good enough for me :/. The theme installation from theme garden worked erratic somehow. For the functionality, Tumblr is pretty good, it pulls the page to its server after you add it with the bookmarklet; analyze the page to find contents you may want to blog about on the page.

Tumblr could be linked with twitter so that all twitter updates will be copied to your Tumblr and vice versa. Also, you can import feeds, bookmark from social bookmarks and posts from other blogging services (thus Tumblr could also act as a bridge to let your twitter followers know when you write a post or bookmark something).

Media could be uploaded to Tumblr but there is a 10MB file size limit (I’m unsure what the storage limit is, see quote below). If you want to post bigger stuff, you are on your own.

Tumblr retains the right to create limits on use and storage in its sole discretion at any time with or without notice. [Terms of service]

The bookmarklet is in the “Goodies” tab; there you’ll also find your email-to-post address, iPhone app and third party apps. The dashboard interface is a little clumsy, functions are distributed on tabs, sidebar, and a menu; but that doesn’t seem to limit Tumblr’s popularity.

You can customize the theme with HTML and CSS. You can also attach your Tumblr to your domain as long as you are able to modify A records (it took a really long time to propagate through the nameservers so you may be unable to view your blog for one or two days if you use this). There are APIs available, but I haven’t looked to see what they can do (yet).

Soup.io

Quite a competitor for Tumblr, most of the features are the same. The interface has more web 2.0 slides and fades, but the page still refreshes to update itself so I’m still not completely satisfied with this service. It allows you to import from more services (say furl or weheartit) but its bookmarklet isn’t as good. It will only detect media on popular sites; it does not analyze the page so it’s very likely that you’ll have to prepare a link to the media first.

The basic theme structure seems to be fixed. However you are allowed to theme it with CSS

Gelato

Seeing Tumblelogs in action somewhat annoy me why such a small amount of content take a whole page load? Isn’t some AJAX to flip pages are better? Then I went to the usual developer’s craze wanting to develop a whole new service that satisfies my every need, since the above are hosted by the blog provider. And as paranoid as a user may be, leaving data on a stranger’s server just doesn’t bring a nice feeling :P.

Fortunately, thank to the free culture (as in free beer :P), someone must have done the job for you! In this case it’s Pedro Santana: from Gelato’s homepage you can download the source (latest at the time of writing: 0.95).

Pretty close clone of Tumblr. To the average user Gelato only provides basic features. You can blog, change the theme, use the bookmarklet but that’s all. The advantage is you have full control over it. Its template share similarities with WordPress.

Still no AJAX for page transition, which I’m pretty disappointed since it advertised itself “built upon AJAX”. But I don’t think that would be a problem though, you have the source of a working Tumblelog!

Toy box

After finishing 7 years worth of Megatokyo drawings (you might have seen from the last post :P), I decided to wrap things up and finished polishing my toybox

It’s a collection of stuff I code from time to time. They were usable before, but wasn’t integrated and their look was so Web 1.0, I always wanted to “renew” them a little, now I am finally in the right condition to get it done :).

At the time of this post, my toy box contains:

You may read more about them from their page. If you have any comment for them, you can leave them here

Building a PHP file browser

This week’s assignment that I gave to another group member, but the result turned out unsatisfactory. It can’t even delete a folder, move a file or copy them. Yes, it’s pretty easy to build a PHP file browser, it’s just a matter of calling file_* functions, but combining them together can be tricky. For example, this guy used GET to pass data and operation between screens. What happen if a file name contains & or =, the characters that make the GET themselves? It has to be revamped (and I have to do this -__-)

About code style, they claim to have done it in procedural style, but… I don’t know what kind of programming is that to name. Well, it worked on some aspects nevertheless, so I decided to move straight to fix the interface. I just can’t bear to see that pile of mumble-jumble ><; and the result…

Interface

Icons are from famfamfam. Thanks James! You may notice that it looks familiar. Yes, it resembles Vista’s explorer. I love Mac in overall but for the interface alone, explorer looks better than finder. I also tested some css pseudo-class: the rows will be highlighted when you move over them. A breadcumb would be trivial to implement too, but that’s something I can work with already 🙂

The code

Basically, this is just a directory list with more function added at the beginning so that it will execute the required operation before listing. The original code used scandir() to get the files in the directory, which is a php5 function (the only thing PHP5 specific in the project); I decided to fall back to its PHP4 equivalent so it can run on older servers (the code is from PHP manual)

$dh  = opendir($newPath);
while (false !== ($filename = readdir($dh))) {
	$files[] = $filename;
}
sort($files);

I know, most host have upgraded to PHP5 and PHP6 is at the horizon, but when it’s so easy to make it compatible, why not?

For most online file browser I’ve seen, they will only allow clicking on the filenames to open them; while on your familiar explorer, nautilus or finder you can click anywhere on the row, so I decided to move the onclick to the row <tr>.
It worked! But then another problem rises: whenever the checkbox is ticked, the view changes to the corresponding file / directory without waiting for the user to choose what to do with the file!. This is possibly the reason why online file browsers does not allow this 😉 After some research, I found a way to stop the event from bubbling (Internet Explorer terminology) or propagating (W3C terminology) from the checkbox to the row (i.e. after you check the checkbox the browser will stop and forget the fact that you clicked inside the row)

function doSomething(e)
{
	if (!e) var e = window.event;
	e.cancelBubble = true;
	if (e.stopPropagation) e.stopPropagation();
}

The code above do fine if you set the event with attachEvent(), but for the code I have the event is attached in the traditional way (onclick=”js”). And trying to put an anonymous function in is no use, “e” would be undefined in all of the browsers 🙁

The secret? In IE the window.event could be accessed but firefox does not seems to have it. Instead, it have “event” (duh) and the following code work (I wrote it together in one line to put to the onclick)

if(!e){var e = window.event;if(!e){e = event;if(e.stopPropagation)e.stopPropagation();}e.cancelBubble = true;}

Coding this, just to know that directory browsing can be tricky. There are functions, like that there’s a rmdir, but It just won’t delete non-empty directories and unlike the command you know from popular OSes, it doesn’t have /s to –-recursive to force delete the directory. That’s when you have to write your own function to get the job done.

function recursiveDel($dir) {
		if ($dir[strlen($dir) - 1] != "/")
			$dir = $dir."/";
		$mydir = opendir($dir);
		while(($file = readdir($mydir)) !== false) {
			if($file != "." && $file != "..") {
				// Unix compat
				chmod($dir.$file, 0777);
				if(is_dir($dir.$file)) {
					chdir('.');
					recursiveDel($dir.$file.'/');
					rmdir($dir.$file);
				}
				else
					unlink($dir.$file);
			}
		}
		closedir($mydir);
		rmdir($dir);
	}

Now that most directories can be deleted, what about directories that contains special characters mentioned above? Well, you may notice when browsing that sometimes your Address bar contains something like %2f%ac; they are ASCII characters, they can be embedded in an URL and then retrived later thanks to urlencode() and urldecode(). A better solution is instead of using GET, use POST or a hidden field.

About security, the guy didn’t even check what the user put in. The only check there is the firectory name check “Invalid directory name. It can’t contains . .. /”. Anyone could just put in path =/ etc / passwd in the address bar at any time. What a common beginner pitfall! A couple of str_replace() wouldn’t hurt…

Below is the code to this moment. Note that its capability is limited to navigating, creating directories and Upload file, but at least it have done those quite good and if one’s only need is somewhere to occasionally throw one or two file in; or to study how file managers work, this script is a good choice 🙂

Download the source

Gotta OOP-rize it and clean it up…

4 hours later edit: Phew, done! ^^ You can have a peek over here. This is a stripped down version, so you can’t upload and edit / preview the file for obivous reasons 😛 other than that, all functions displayed on screen is working now.