I have been scraping websites using pure php for some time now, mainly through php library tools and simple dom. However, I have discovered Java Selenium WebDriver which makes things a whole lot easier and with the use of ANT and Maven you can do some wonderful things. You can do some of the same things with Selenium RC Server, however WebDriver is more robust in my opinion
1) First you will need to download Eclipse IDE for Java developers the main platform for testing java applications.
2) Open the software and create a new class add your package name above the class, if not automatically entered
3) Import the following libraries so that web driver, mysql and other functions pertaining to your application will run
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.openqa.selenium.support.ui.Select;
4) Then declare you class
public class myclass {
5) Initialize webdriver and mysql. In this case I am using firefox, but any browser should work
// create webdriver and open firefox
You connect to the browser by typing WebDriver oWDFF = new FirefoxDriver();
However, in my case there is a problem with the version of firefox that I am using so I connected using a direct path
System.setProperty("webdriver.firefox.bin","C:\\Program Files (x86)\\Mozilla Firefox\\firefox.exe");
WebDriver WD = new FirefoxDriver();
// maximize window
WD.manage().window().maximize();
System.out.println("Window Maximized");
// database connection
6) System.out.println("Connecting to the database..............");
String DBDriver = "com.mysql.jdbc.Driver";
Class.forName(DBDriver);
Connection Conn = DriverManager.getConnection("jdbc:mysql://host url/database", "username", "password");
7) Once you are connected you will then need to do mysql SELECT, UPDATE, INSERT statements.
The query string is similar to query string done in php or other means of communicating with the mysql server
String query = "SELECT * FROM TABLE";
8) Create a prepared statement to execute query
java.sql.PreparedStatement preparedStmt = Conn.prepareStatement(query);
9) Store results in array
ResultSet rs = preparedStmt.executeQuery();
10) Get variables for print
while (rs.next())
{
variable = rs.getString("variable");
System.out.print("Here is your first Java / Selenium Variable" + variable + ");
}