Commented Issue: SelectSingleNode searches from root when called from child [32539]

April 13, 2012, 8:00 pm

≫ Next: New Post: Extracting Images And HTML From .html File

≪ Previous: New Post: Problem selecting within a node

Even after having isolated to a child node via SelectNodes(), applying SelectSingleNode() to the child will search from the root above (presume from root, above anyway). This behaviour is unexpected, I expect it to search from the isolated child and below only.

Example:

<HTML>
<HEAD><TITLE> HTML Agility Bug Demo</TITLE></HEAD>
<BODY>
<somestuff>stuff here</somestuff>
<table>
<tr><td>first row</td></tr>
<tr><td>second row</td></tr>
<tr><td>third row</td></tr>
</table>
</BODY>
</HTML>

HtmlAgilityPack.HtmlDocument doc = new HtmlDocument();
doc.Load(@"HtmlAgilityBugDemo.html");
HtmlNodeCollection rowNodes = doc.DocumentNode.SelectNodes("//table/tr");
foreach (HtmlNode row in rowNodes)
{
string test1 = row.InnerText; // Works, enumerates correctly
string test2 = row.SelectSingleNode("//td").InnerText; // This ALWAYS returns "first row" !!
string test3 = row.SelectSingleNode("//somestuff").InnerText; // Found somestuff. But no stuff within this node !!
}
Comments: ** Comment from web user: HarryCallahan **

Yes very annoying as it's a common thing to do.

I've got around it, or rather contended with it, by loading the child's InnerHtml into a new doc and using that. A heavy weight solution.

↧

New Post: Extracting Images And HTML From .html File

April 16, 2012, 10:35 am

≫ Next: New Post: Inner Text but the hard way

≪ Previous: Commented Issue: SelectSingleNode searches from root when called from child [32539]

Hello,

I'm new to the Html Agility Pack and was wondering if someone could help me out. I have a WPF C# project with an HTML string as shown below:

htmlString = "<HTML><HEAD></HEAD><BODY>Here are some images.</br>1) <IMG style='MARGIN-BOTTOM: 20px; MARGIN-LEFT: 20px' align=right src='images/sample001.jpg'>2) <IMG style='MARGIN-BOTTOM: 25px; MARGIN-LEFT: 25px' align=right src='images/sample002.png'></br> And some docs as well.</br>1) href='javascript:parent.POPUP({url:'testDoc001.htm',type:'shared',width:600,height:645})'></br>2) href='javascript:parent.POPUP({url:'testDoc002.html',type:'shared',width:700,height:712})'></br></BODY></HTML>";

I would like to be able to parse this string and get out an array of all of the images and .html documents that are references.

In this particular example this array would be:

[0] = "images/sample001.jpg"
[1] = "images/sample002.png"
[2] = "testDoc001.htm"
[3] = "testDoc002.html"

Can someone send me a snippet of code or show me how to go about doing this?

Thanks

↧

New Post: Inner Text but the hard way

April 16, 2012, 5:01 pm

≫ Next: New Post: How to check HtmlWeb.LoadAsync finished

≪ Previous: New Post: Extracting Images And HTML From .html File

I need to get some text from this web page http://openbook.etoro.com/ahanit/#/profile/Trades/ I want to use the trade feed for my program to analyse the sentiment of the markets.

I used the VB browser control and the get element command but its not working. The problem is that whenever my browser starts to open the page I get java scripts errors. Every help is welcome

I tried with DOM but seems that i dont quite understand what i need to do :) Here is what i get until now

Dim code As String Using client As New WebClient

        code = client.DownloadString("http://openbook.etoro.com/alemzolota/#/profile/Trades/")
    End Using

    Dim htmlDocument As IHTMLDocument2 = New HTMLDocument(code)
    htmlDocument.write(htmlDocument)


    Dim allElements As IHTMLElementCollection = htmlDocument.body.all

    Dim allid As IHTMLElementCollection = allElements.tags("id")
    Dim element As IHTMLElement

    For Each element In allid
        element.title = element.innerText
        MsgBox(element.innerText)

    Next

Thank you :)

↧

New Post: How to check HtmlWeb.LoadAsync finished

April 16, 2012, 11:18 pm

≫ Next: New Post: A bug when save to a stream

≪ Previous: New Post: Inner Text but the hard way

I have a class in Viewmodel folder that using HtmlWeb.LoadAsync to get data from web:

public void GetContent(int index)
        {
            //get content
            HtmlWeb.LoadAsync(Magazines[index].Url, (s, args) =>
            {

              ....

             this.Magazines[index].ContentNode = contentNode.InnerHtml;
            });

}

Then I want to get the Magazines[index].contentNode in detailview.xaml like this:

protected override void OnNavigatedTo(NavigationEventArgs e)
        {
            base.OnNavigatedTo(e);
            string selectedIndex = "";
           
            if (NavigationContext.QueryString.TryGetValue("selectedItem", out selectedIndex))
            {
              index = int.Parse(selectedIndex);
              App.MagazineViewModel.GetContent(index);
              String content = App.MagazineViewModel.Magazines[index].ContentNode;
              DetailBrser.NavigateToString(
                 "<html><head><meta name='viewport' content='width=570, user-scalable=yes' /></head><body>"
                 + HtmlHelper.EncodeUnicode(content)
                 + "</body></html>"
                 );
            }

But the problem is the loadAsync method has not finished yet, so App.MagazineViewModel.Magazines[index].contentNode is empty. that also make content empty. so how can I check App.MagazineViewModel.GetContent(index) finish in detailview.xaml then set the content string. Or any other idea for this.

↧

New Post: A bug when save to a stream

April 18, 2012, 11:31 pm

≫ Next: New Post: Gett value from single node

≪ Previous: New Post: How to check HtmlWeb.LoadAsync finished

The methods

"public void Save(Stream outStream, Encoding encoding)"

and

" public void Save(Stream outStream)"

 in class HtmlDocument,declare a StreamWriter for writing data to stream with default bufferSize.

But not with a flush or close method at end of wirte.So some data in buffer will be lost.

eg:

System.IO.MemoryStream ms = new MemoryStream();
 htmldoc.Save(ms, System.Text.Encoding.UTF8);

Chang the method "public void Save(StreamWriter writer)" in HtmlDocument as following:

public void Save(StreamWriter writer)
        {
            Save((TextWriter)writer);
            writer.Flush();       //add Flush method to write buffer data to stream
        }

↧

New Post: Gett value from single node

April 23, 2012, 3:28 am

≫ Next: Created Issue: Cannot find type System.Xml.XPath.XPathNavigator in module System.Xml.dll [32616]

≪ Previous: New Post: A bug when save to a stream

I try get single value from this node:

http://gg.pl/dysk/5TppYI-rUkJB5DppYI-rauA/depozyty%20z%C5%82otowe%20-%20WIBOR%206M-121502.png

from this site :

http://www.money.pl/pieniadze/depozyty/zlotowe/WIBOR6M,depozyty.html

        Dim text As String
        Dim strona As New HtmlAgilityPack.HtmlWeb()
        Dim doc As New HtmlAgilityPack.HtmlDocument()
        doc = strona.Load("http://www.money.pl/pieniadze/depozyty/zlotowe/WIBOR6M,depozyty.html")
        text = doc.DocumentNode.SelectSingleNode("//html/body/div[4]/div[2]/div[2]/div[2]/div[2]/div/table/tbody/tr/td[2]").InnerText

I have error when try gett vale:

System.NullReferenceException was unhandled

So, where I have error ??

↧

Created Issue: Cannot find type System.Xml.XPath.XPathNavigator in module System.Xml.dll [32616]

April 23, 2012, 5:25 am

≫ Next: New Post: Html encoding only text nodes

≪ Previous: New Post: Gett value from single node

Happens as soon as I include the HtmlAgilityPack.dll reference and build the project.

↧

New Post: Html encoding only text nodes

April 23, 2012, 9:40 am

≫ Next: Commented Issue: SelectSingleNode searches from root when called from child [32539]

≪ Previous: Created Issue: Cannot find type System.Xml.XPath.XPathNavigator in module System.Xml.dll [32616]

Hi,

Struggled with this for a while.

Can html agility pack currently encode only text nodes ?

I.E : "

hi <3 Did you know we're stocklists blah blah, can read our blog here <a href='http://google.com'>http://google.com</a> blah

<3 hi

would become

hi &lt;3 Did you know we&#39;re stocklists blah blah, can read our blog here <a href='http://google.com'>http://google.com</a> blah

&gt;3 hi

I searched over the web and found no answer. The thing is because of the processing I'm doing to the html text I don't really want for the html tags like link tag or any other to be encoded too

Thanks a ton,

Doru

↧

Commented Issue: SelectSingleNode searches from root when called from child [32539]

April 23, 2012, 2:05 pm

≫ Next: Commented Issue: SelectSingleNode searches from root when called from child [32539]

≪ Previous: New Post: Html encoding only text nodes

Html Agility Pack is using XPath to address nodes. See the XPath examples at http://msdn.microsoft.com/en-us/library/ms256086.aspx. Html Agility Pack is doing exactly what you’ve asked it to do. The current node ‘row’ is the current context, not the root of a new document. Thus “//td” ask for the first <td> element in the whole document, which is always “first row” in your example.

If you want to search the current node and its children, use “.//td” and “.//somestuff”.

↧

Commented Issue: SelectSingleNode searches from root when called from child [32539]

April 23, 2012, 6:30 pm

≫ Next: New Post: Disable Proxy

≪ Previous: Commented Issue: SelectSingleNode searches from root when called from child [32539]

I stand corrected, and surprised.

Testing with XMLDocument,

XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);
var rowNodes = doc.SelectNodes("//table/tr");
foreach (XmlNode row in rowNodes)
{
string test1 = row.InnerText; // Works, enumerates correctly
string test2 = row.SelectSingleNode("//td").InnerText; // This ALWAYS returns "first row" !!
string test3 = row.SelectSingleNode("//somestuff").InnerText; // Found somestuff. But no stuff within this node !!
}

The result is the same, so as ThomasGat says it's expected XPath behaviour and is corrected by the simple use of "." to denote current node.

↧

New Post: Disable Proxy

April 24, 2012, 1:05 am

≫ Next: New Post: How to use HAPPhone APIs to achieve this function

≪ Previous: Commented Issue: SelectSingleNode searches from root when called from child [32539]

Hi,

how can I totally disable the use of a proxy server?

↧

New Post: How to use HAPPhone APIs to achieve this function

April 24, 2012, 6:06 pm

≫ Next: New Post: Validate/fix a html content

≪ Previous: New Post: Disable Proxy

Hi,

I have ran into a problem usig XmlReader on WP7. It is because it does not support BIG5 encoding when trying to read XML content.

Here is what I was trying to do.

using (XmlReader reader = XmlReader.Create(http://feeds.feedburner.com/nownews/politic)
while (reader.Read())  // iterate through the document
    switch (reader.NodeType)  
        case XmlNodeType.Text:  
            string s = reader.Value; //looking for content under all <item>

I wonder if someone can give quick code so that I can try if I can get character display correctly on device. Very appreciated!

thsieh

↧

New Post: Validate/fix a html content

April 25, 2012, 2:43 am

≫ Next: New Post: .net 4.5 version and WinRT version

≪ Previous: New Post: How to use HAPPhone APIs to achieve this function

Hi all,

Is it possible to validate/fix a html content with HtmlAgilityPack?

Thank you

↧

New Post: .net 4.5 version and WinRT version

April 25, 2012, 6:08 am

≫ Next: Created Issue: Html Agility Pack does not replace character references [32621]

≪ Previous: New Post: Validate/fix a html content

Soon will be avaliable windows8 rc version (June). Developers can start writing apps now.

There are not librarys compatible with winrt (which can verifie with WACK ).

Html Agility Pack is one of the most interesting lib for me as a developer.

How can I help with porting Html Agility pack for .net 4.5 and WinRT

Sychev Igor

sychev-igor.90@mail.ru

skype: sychevigormsk

↧

Created Issue: Html Agility Pack does not replace character references [32621]

April 25, 2012, 2:56 pm

≫ Next: New Post: how to get table from another website with method=post

≪ Previous: New Post: .net 4.5 version and WinRT version

This document:

<!DOCTYPE HTML PUBLIC ""-//W3C//DTD HTML 4.01 Transitional//EN"" ""http://www.w3.org/TR/html4/loose.dtd"">
<html><body>1&2'3$4</body></html>

shows in Internet Explorer as "1&2'3$4" (without the quotes)

However, loading it into Html Agility Pack's HtmlDocument, calling CreateNavigator on the document, and Evaluate("/html/body") on the navigator returns "1&2'3$4" (again, without the quotes)

This is incorrect, and inconsistent with the XML implementation of XPath that comes with .NET 4.0, which does replace "&" with "&".

↧

New Post: how to get table from another website with method=post

April 26, 2012, 5:25 am

≫ Next: Commented Issue: HtmlAgilityPack v1.4.3 parses tables wrong [32107]

≪ Previous: Created Issue: Html Agility Pack does not replace character references [32621]

I want a table from another website. For testing purpose i have made an html file and saved it on my desktop with following code:

<html>
<head>
    <title>Search</title>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body >
    <center>

     <form method="POST" action="targetsite">

   <input type="submit" value="submit" id="Button1"/>
   <input type="hidden" name="searchpl" value="7884" />
   <input type="HIDDEN" name="NRequest" value="parameterN"/>
        </form>
    </center>
</body>
</html>

When i click submit button it works fine and redirect me to targetsite. but i dont want to be redirected to targetsite instead i want to get the targetsite second table data. now i know htmlagilitypack can do it, i have tested a site and it works fine but i dont know how to send post data in htmlagilitypack with above information. Can you help? there is an error when i use this code

WebClient myWebClient = new WebClient();
        var doc = new HtmlDocument();

        NameValueCollection myNameValueCollection = new NameValueCollection{
        {"searchpl","BZA 7884"}
        ,{"NRequest","ChannelType=ct/Browser|RequestType=rt/Business|SubSystemType=st/Payments|AgencyType=at/PVO|ServiceName=PVO_VIO_BY_PL|PageID=PVO_Search|PVO_V_NUMBER=|P_ID=BZA7884|PVO_SEARCH_T=false|PVO_CO=TRUE|PVO_P_TYPE=PAS|PVO_S_NAME=NY"}
                };

        byte[] responseArray = myWebClient.UploadValues("TargetSite", "Post", myNameValueCollection);
        xRow = "/html[1]/body[1]/center[1]/table[1]";
        doc.LoadHtml(Encoding.ASCII.GetString(responseArray));
       divScrap.InnerHtml= doc.DocumentNode.SelectSingleNode(xRow).InnerText.Trim();

Error detail: An Error Has Occurred Your session has timed out or expired.

↧

Commented Issue: HtmlAgilityPack v1.4.3 parses tables wrong [32107]

April 26, 2012, 5:56 pm

≫ Next: Commented Issue: HtmlAgilityPack v1.4.3 parses tables wrong [32107]

≪ Previous: New Post: how to get table from another website with method=post

I have installed HtmlAgilityPack via NuGet and it installed version 1.4.3

This version has an error when handling tables!

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<title>hap test table</title>
</head>
<body>
<table>
<tr>
<td>foo</td>
<td>bar</td>
</tr>
</table>
</body>
</html>

becomes

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<title>hap test table</title>
</head>
<body>
<table>
<tr>
<td>foo
<td>bar
</td></td></tr></table>
</body>
</html>

If I go back to version 1.4.0 then it works like it should...
Comments: ** Comment from web user: DarthObiwan **

So I'm finally able to get back working on HAP (been a few years of long hours and busy life). I am trying to repro this issue and so far with 1.3, 1.4, 1.4.3 I haven't been able to find any difference in InnerHtml nor WriteTo html output. Does anyone have a working example in code they could share that I could look at?

↧

Commented Issue: HtmlAgilityPack v1.4.3 parses tables wrong [32107]

April 26, 2012, 5:59 pm

≫ Next: New Comment on "Examples"

≪ Previous: Commented Issue: HtmlAgilityPack v1.4.3 parses tables wrong [32107]

Note I'm using the information provided in this post, like the original html that is supposed to demonstrate it. I wrote a program to save the output to a text file, made 3 projects in my solution, each referencing a different version of the dll and then compared the results. I've tried WriteTo() and InnerHtml so far

↧

New Comment on "Examples"

April 30, 2012, 4:58 am

≫ Next: Commented Issue: incorrect parse

≪ Previous: Commented Issue: HtmlAgilityPack v1.4.3 parses tables wrong [32107]

hello, but if I wanted to copy the url IMG (.jpg) address within this code as I do? code: <TABLE id=uezszu_24 class="uiGrid fbPhotosGrid" cellSpacing=0 cellPadding=0> <TBODY> <TR> <TD class="vTop"> <DIV class=Wrapper><A class="uiMediaThumb uiScrollableThumb uiMediaThumbHuge" href="www.cccc.com/index.php" name=43563463 rel=theater aria-label="photo" ajaxify="dsgdgbdfgr45y6ghd"><I style="BACKGROUND-IMAGE: url(http://www.fressdgf.com/image.jpg)"></I></A></DIV></TD> </TR> </TBODY> </TABLE>

↧

Commented Issue: incorrect parse

May 2, 2012, 1:48 am

≫ Next: Commented Issue: incorrect parse

≪ Previous: New Comment on "Examples"

please try parse this and you will see a problem:

<form class="patrol" method="post" id="patrolForm">
<p class="timeleft">150</p>
</form>
Comments: ** Comment from web user: Siderite **

Duplicate of http://htmlagilitypack.codeplex.com/workitem/32505

↧