Join The Works program to have access to the most current content, and to be able to ask questions and get answers from Revelation staff and the Revelation community

At 19 MAY 2009 11:05:47PM Bruce Cameron wrote:

I have a need to strip all html code from a string (and thereby leaving the text intact). There are several functions out there

http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=6269&lngWId=4, strip.tags (php) etc. that can do this.

What would be the best way to accomplish calling these external functions?


At 19 MAY 2009 11:37PM Hank Huizinga wrote:

Here's how you would code the link you gave. I did not test this…

script = 'function StripHTML(asHTML)'

script = 'Dim loRegExp'

script = ''

script = 'Set loRegExp=New RegExp'

script = 'loRegExp.Pattern=*]"'

script = 'StripHTML=loRegExp.Replace(asHTML, "")'

script = 'Set loRegExp=Nothing'

script = 'End function'

SWAP @FM WITH CHAR(13):CHAR(10) IN script

oWscript=OleCreateInstance('ScriptControl')

OlePutProperty(oWscript, 'Language', 'VbScript')

zz=OleCallMethod(oWscript, "AddCode", script)

HTML=HTML text'

strippedHTML =OleCallMethod(oWscript, "Run", "StripHTML", HTML)

oWscript='


At 20 MAY 2009 01:01AM Bob Carten wrote:

The regular expression engine is itself an OLE object that you can use inside OI. This offers interesting possibilites for indexed symbolics.

the progid is

oRegEx=OleCreateInstance("VBScript.RegExp")

For a code example see

OleregExp


At 20 MAY 2009 02:27PM Bruce Cameron wrote:

Thanks Hank.

I tested this out and it only stripped the front part.

I passed it "Hello" and it returned

"Hello".


At 20 MAY 2009 02:28PM Bruce Cameron wrote:

Hey Bob.

I'm not sure I follow what you saying in this post. Can you expand?

Thanks.


At 20 MAY 2009 11:14PM Hank Huizinga wrote:

I ran it as vbscript outside of OI and it did the same thing. I suspect it has more to do with the vbscript code.


At 20 MAY 2009 11:21PM Bob Carten wrote:

Hi Bruce

I was trying to say that you can call the regular expression engine from OI. In the example below I run the HTML through several filters.

Note - I just googled filters without understanding them, the ouptut is not exactly what you need. More time with google and a regular expression tutorial will get you a better result.

The general concept is that regular expressions do sophisticated searching and matching on any text. One could create calculated fields that are very smart - like finding repeated words, parsing email or addresses from raw text, find multiline comments, etc. Note that regular expressions are difficult to master and OI has good string handling, so we can usually avoid using the expressions. But it is nice to know they are available.

function StripHTML_Eqample(HTML)

/*

Whipped together from a bunch of googled regulars expressions

*/

Equ true$ To 1

Declare Function Ole_GetWebPage

html=Ole_GetWebPage('http://www.revelation.com')

If Assigned(html) Else html='

oRegEx=OleCreateInstance("VBScript.RegExp")

oRegEx-]Global=true$

oRegEx-]IgnoreCase=true$

* Kill Scripts

* See http://regexblogs.com/blogs/dneimke/archive/2004/01/17/200.aspx

oRegEx-]Pattern=*].*?*]"

work= oRegEx-]Replace(HTML, "")

* Kill css

oRegEx-]Pattern=^\t*a-zA-Z0-9\.# -_:@+\t*\{.*\t*$"

work=oRegEx-]Replace(work, "")

* Kill Comments

* http://www.webmasterworld.com/forum88/11584.htm

oRegEx-]pattern=/(\/\*(?!()).*?\*\/¦\/\/.*?\r\n+¦\r\n+)/s'

work=oRegEx-]Replace(work, "")

* Kill HTML tags

oRegEx-]Pattern=*]"

ans=oRegEx-]Replace(work, "")

Return ans


At 21 MAY 2009 11:47AM Bruce Cameron wrote:

Bob, I gotcha, that's great info. Thanks.


At 21 MAY 2009 05:50PM Bruce Cameron wrote:

Bob,

A continuation of the question…

I can take and open an xxxx.html file in IE for example.

I can then do a "File -] Save As" plain text (txt) and all the

html code is stripped.

Is there a way to do that in OI using the ole Shell.Explorer?

Just a thought.

Bruce


At 21 MAY 2009 10:14PM Bob Carten wrote:

Oh, you meant that way

Make a form with an ole control named browser, a textbox named RESULT

add a button with this script:

url=http://www.google.com"

Call Send_Message(@window:'.BROWSER', 'Navigate2', url)

loop

call yield()
readystate=get_property(@window:'.BROWSER', 'ReadyState')

until readystate=4

repeat

text=Get_Property(@window:'.BROWSER', 'Document.Body.OuterText')

x=Set_Property(@window:'.RESULT', 'TEXT', text)


At 22 MAY 2009 10:42AM Bruce Cameron wrote:

That is sweet.

I modified it so that in a function I write my string out to "c:\temp\xxx.html" then start the window invisible, then put the "c:\temp\xxx.html" in for the url and do a send_event click for the code and grab the .RESULTS. Works for me for now! Thanks Bob!


At 22 MAY 2009 05:19PM Bruce Cameron wrote:

If I wanted to convert your example into a function would it be…

Function StripHtml(datastring)

*

oleObj=OleCreateInstance('Shell.Explorer')

status=OleStatus()

if status then return 'OLE Error code: ':status

* It doesn't work here on the method, not sure what is the correct way

ReturnValue=OleCallMethod(oleObj, "Navigate2","www.google.com")

status=OleStatus()

if status then return 'OLE Error code: ':status

loop

 call yield()
 readystate=OleGetProperty(oleObj, 'ReadyState')

until readystate=4

repeat

strippedstring=OleGetProperty(oleObj, 'Document.Body.OuterText')

status=OleStatus()

if status then return 'OLE Error code: ':status

Return strippedstring


At 22 MAY 2009 05:19PM Bruce Cameron wrote:

If I wanted to convert your example into a function would it be…

Function StripHtml(datastring)

*

oleObj=OleCreateInstance('Shell.Explorer')

status=OleStatus()

if status then return 'OLE Error code: ':status

* It doesn't work here on the method, not sure what is the correct way

ReturnValue=OleCallMethod(oleObj, "Navigate2","www.google.com")

status=OleStatus()

if status then return 'OLE Error code: ':status

loop

 call yield()
 readystate=OleGetProperty(oleObj, 'ReadyState')

until readystate=4

repeat

strippedstring=OleGetProperty(oleObj, 'Document.Body.OuterText')

status=OleStatus()

if status then return 'OLE Error code: ':status

Return strippedstring


At 22 MAY 2009 06:01PM Bob Carten wrote:

Bruce,

Shell.Explorer wants an owner window to display the HTML.

It will not work as a plain OLE object. You need to use InternetExplorer.Application instead

For Example

Subroutine Ole_GetWebText(void)

/*

** Pull up a web page, strip out the text

*/

IE=OleCreateInstance('InternetExplorer.Application')

status=OleStatus()

if status then

return 'OLE Error code: ':status

End

IE-]Visible=0

x=IE-]Navigate("http://www.google.com")

Loop

Call Yield()

while IE-]Busy

repeat

doc=IE-]Document

body=doc-]Body

result=body-]InnerText

x=IE-]Quit()

Return result


At 23 MAY 2009 03:15PM Bruce Cameron wrote:

Bob,

Perfect! I changed it to a function and added an arg for datalocation so you can put in a url or pathfilename.

Thank you, again.

bc

View this thread on the Works forum...

  • third_party_content/community/commentary/forums_works/6cafbd7b35c2d76c852575bc0011027f.txt
  • Last modified: 2023/12/30 11:57
  • by 127.0.0.1