EXCEL – Get Data From Website That Requires A Login

Manostaxx

The text that follows is owned by the site above referred.

Here is only a small part of the article, for more please follow the link

SOURCE: http://dailydoseofexcel.com/archives/2011/03/08/get-data-from-website-that-requires-a-login/

If you want to get data into Excel from a website that requires a login, you may have already been frustrated that web queries don’t work so well. By automating Internet Explorer and the login process, you can get to that data. This example shows how to login, retrieve a table, and paste it into Excel.

I’ll be using a website I created to demonstrate. The username is dailydose and the password is password. Behind the login, there’s a table that I stole from Contextures.com. I steal that table for all my examples, so don’t tell Debra. If you attempt to get the table, the site recognizes that you’re not logged in and sends you to the login page. Only after logging in can you get to the goods.

To automate Internet Explorer, we’ll need to set a reference to Microsoft Internet Controls.

While you’re in the References dialog, set a reference to Microsoft Forms 2.0 Library. This will already be checked if you have a userform in your project. I’ll use it to put some text into the clipboard.

Before I get to the code, I need to do a little homework. I go to the login page using Firefox and choose Page Source form the View menu.

I take note of the names of the two controls on the form, login and password. I’ll need those later. I also note that the input type is submit. Next, I go to the page I ultimately want and look at its source.

I note that the id for the table I want is “sampletable”. If the table you want doesn’t have an id, you need to loop through all of the tables and try to find some unique characteristic of the table to identify it. Here’s an example that loops through all the tables until it finds the 15-Year-Fixed Average Interest Rate table on Yahoo!’s finance page.

I have all the data I need to produce the code

Sub GetTable()

Dim ieApp As InternetExplorer
Dim ieDoc As Object
Dim ieTable As Object
Dim clip As DataObject

‘create a new instance of ie
Set ieApp = New InternetExplorer

‘you don’t need this, but it’s good for debugging
ieApp.Visible = True

‘assume we’re not logged in and just go directly to the login page
ieApp.Navigate “http://severe-frost-552.heroku.com/login”
Do While ieApp.Busy: DoEvents: Loop
Do Until ieApp.ReadyState = READYSTATE_COMPLETE: DoEvents: Loop

Set ieDoc = ieApp.Document

‘fill in the login form – View Source from your browser to get the control names
With ieDoc.forms(0)
.login.Value = “dailydose”
.Password.Value = “password”
.submit
End With
Do While ieApp.Busy: DoEvents: Loop
Do Until ieApp.ReadyState = READYSTATE_COMPLETE: DoEvents: Loop

‘now that we’re in, go to the page we want
ieApp.Navigate “http://severe-frost-552.heroku.com/”
Do While ieApp.Busy: DoEvents: Loop
Do Until ieApp.ReadyState = READYSTATE_COMPLETE: DoEvents: Loop

‘get the table based on the table’s id
Set ieDoc = ieApp.Document
Set ieTable = ieDoc.all.Item(“sampletable”)

‘copy the tables html to the clipboard and paste to teh sheet
If Not ieTable Is Nothing Then
Set clip = New DataObject
clip.SetText “” & ieTable.outerHTML & “”
clip.PutInClipboard
Sheet1.Select
Sheet1.Range(“A1”).Select
Sheet1.PasteSpecial “Unicode Text”
End If

‘close ‘er up
ieApp.Quit
Set ieApp = Nothing

End Sub

The code first goes directly to the login page. This wouldn’t be necessary if you already had a cookie remembering that you logged in. However, rather than check to see if we get to the right page, I go login every time.

Next we populate the two controls, login and password, and submit it.

This actually takes us right where we want to be. But that isn’t always the case. Sometimes you’ll want to go a different page than the one that the login script redirects to. Next in the code, I navigate to the page I want.

You’ll notice that every time I navigate somewhere (or submit, causing a redirect), I run the same six lines of code. Those six lines, condensed to two with colons, cause the code to wait around until the page is done loading.

Do While ieApp.Busy: DoEvents: Loop
Do Until ieApp.ReadyState = READYSTATE_COMPLETE: DoEvents: Loop

Now that the administrative work is done and my page is loaded, I get the table that has the id “sampletable”. I grab the OuterHTML property, surround it with html tags, and stuff it in the clipboard via the DataObject. The html tags trick Excel into thinking the format of the DataObject is html. Duly tricked, Excel will parse the html and try to make sense of it. It should work just as if you had copied from the web page and pasted into Excel. I select cell A1 and Paste Special Unicode Text.

Finally, I quit IE.

I don’t know if this will work on every website, but it’s worked on a few that I’ve tried. I’ve also experienced some problems with it on other people’s machines. We never got to the bottom of it, but we think it’s firewall related. If you have problems, leave a comment.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s