Cry about...
Delphi Programming with TWebBrowser
How to get the HTML displayed in a TWebBrowser
There are three ways to get the HTML displayed in a web browser:
- Obtain the HTML from the WebBrowser DOM
- Obtain the HTML from the WebBrowser
- Obtain the HTML from the browser cache
There are advantages and disadvantages of each.
Obtain the HTML from the WebBrowser DOM
To retrieve the HTML directly from the WebBrowser's DOM:
function GetHtml(var webBrowser as TWebBrowser): String; var document as IHTMLDocument2; begin document := webBrowser.Document as IHTMLDocument2; result := document.body.innerHTML; end;
This is simple and works well. The only (and main) problem with it is that it is returning the HTML that the web-browser has displayed and this is not necessarily the same as the original HTML. For example, if the original HTML file included:
<script type="text/javascript"> document.write('Hello'); </script>
then the HTML returned by the above function will contain the "Hello" but not the "<script ...". It also does not include any header information (such as keywords and the title).
Obtain the HTML from the WebBrowser
The following function will extract the HTML from a WebBrowser, including the header block as well as the body of the HTML:
function GetBrowserHtml(const webBrowser: TWebBrowser): String; var strStream: TStringStream; adapter: IStream; browserStream: IPersistStreamInit; begin strStream := TStringStream.Create(''); try browserStream := webBrowser.Document as IPersistStreamInit; adapter := TStreamAdapter.Create(strStream,soReference); browserStream.Save(adapter,true); result := strStream.DataString; finally end; strStream.Free(); end;
Obtain the HTML from the browser cache
The following example shows how to retrieve the HTML from the browser cache:
var h_cachedInternet: HINTERNET; function GetRawHtml(var web_browser: TWebBrowser): String; var http_handle: HINTERNET; buffer: array [0..20] of Char; url: String; bytes_read: DWORD; begin url := web_browser.LocationURL; http_handle := InternetOpenUrl(h_cachedInternet, PChar(url),nil,0,INTERNET_FLAG_NO_UI,0); if http_handle = nil then result := '' else begin //-------------------------------------------------------------- // Retrieve the URL data. Hopefully this should be straight from // the cache because of how the internet connection was defined. //-------------------------------------------------------------- result := ''; repeat InternetReadFile(http_handle,@buffer,Length(buffer),bytes_read); result := result + Copy(buffer,1,bytes_read); until bytes_read =0; InternetCloseHandle(http_handle); end; end; initialization //-------------------- // Initialise WinInet. //-------------------- h_cachedInternet := InternetOpen(PChar(application.title), INTERNET_OPEN_TYPE_PRECONFIG_WITH_NO_AUTOPROXY,nil,nil, INTERNET_FLAG_FROM_CACHE);
This has the advantage that it does not require an instance of TWebBrowser, so will be more suited to some applications.
Note:
- It is using WinInet functions and only uses the browser to obtain the URL.
- It is reading the file directly from the WinInet file cache - it is therefore assumed that the file in the cache will be the same as that used by the TWebBrowser. The assumption is reasonable most of the time, but it is possible that the file may have been flushed from the cache, not cached or replaced by a different copy by another Web Browser.
See also: How to navigate a frameset.
These notes are believed to be correct for Delphi 6, but may apply to other versions as well.
About the author: Brian Cryer is a dedicated software developer and webmaster. For his day job he develops websites and desktop applications as well as providing IT services. He moonlights as a technical author and consultant.