I can retrieve the text of a web page, let's say http://stackoverflow.com/questions with some real and made up links:
/questions
/tags
/questions?sort=votes
/questions?sort=active
randompage.aspx
../coolhomepage.aspx
Knowing my originating page was http://stackoverflow.com/questions is there a way in .Net to resolve the links to this?
http://stackoverflow.com/questions
http://stackoverflow.com/tags
http://stackoverflow.com/questions?sort=votes
http://stackoverflow.com/questions?sort=active
http://stackoverflow.com/questions/randompage.aspx
http://stackoverflow.com/coolhomepage.aspx
Kind of like the way a Browser is smart enough to resolve the links.
=========================== Update - Using David's solution:
'Regex to match all <a ... /a> links
Dim myRegEx As New Regex("\<\s*a (?# Find opening <a tag) " & _
".+?href\s*=\s*['""] (?# Then all to href=' or "" ) " & _
"(?<href>.*?)['""] (?# Then all to the next ' or "" ) " & _
".*?\> (?# Then all to > ) " & _
"(?<name>.*?)\<\s*/a\s*\> (?# Then all to </a> ) ", _
RegexOptions.IgnoreCase Or _
RegexOptions.IgnorePatternWhitespace Or _
RegexOptions.Multiline)
'MatchCollection to hold all the links that are matched
Dim myMatchCollection As MatchCollection
myMatchCollection = myRegEx.Matches(Me._RawPageText)
'Loop through all matches and evaluate the value of the href attribute.
For i As Integer = 0 To myMatchCollection.Count - 1
Dim thisLink As String = ""
thisLink = myMatchCollection(i).Groups("href").Value()
'This checks for Javascript and Mailto links.
'This is not complete. There are others to check I just haven't encountered them yet.
If thisLink.ToLower.StartsWith("javascript") Then
thisLink = "JAVASCRIPT: " & thisLink
ElseIf thisLink.ToLower.StartsWith("mailto") Then
thisLink = "MAILTO: " & thisLink
Else
Dim baseUri As New Uri(Me.URL)
If Not thisLink.ToLower.StartsWith("http") Then
'This is a partial URL so we will assume that it's relative to our originating URL
Dim myUri As New Uri(baseUri, thisLink)
thisLink = "RELATIVE LOCAL LINK: RESOLVED: " & myUri.ToString() & " ORIGINAL: " & thisLink
Else
'The link starts with HTTP, determine if part of base host or is outside host.
Dim ThisUri As New Uri(thisLink)
If ThisUri.Host.ToLower = baseUri.Host.ToLower Then
thisLink = "INSIDE COMPLETE LINK: " & thisLink
Else
thisLink = "OUTSIDE LINK: " & thisLink
End If
End If
End If
'I'm storing the found links into a Generic.List(Of String)
'This link has descriptive text added to it.
'TODO: Make collection to hold only unique internal links.
Me._Links.Add(thisLink)
Next
-
If you mean server-side, you can use
ResolveUrl():string url = ResolveUrl("~/questions"); -
I dont understand what you mean by "resolve" in this context, but you can try inserting a base html element. Since you asked how the browser would handle it.
"The
<base>tag specifies a default address or a default target for all links on a page." -
You mean like this?
Uri baseUri = new Uri("http://www.contoso.com"); Uri myUri = new Uri(baseUri, "catalog/shownew.htm"); Console.WriteLine(myUri.ToString());Sample comes from http://msdn.microsoft.com/en-us/library/9hst1w91.aspx
rvarcher : Yes, this is what I was needing. This works on URLs originating from different location. I'll update my question to show how I implemented it. Thanks!
0 comments:
Post a Comment