GAS

GAS Library | Html parse for web scraping

How to add

You can add this library by the key below. (for legacy script editor)

MJqa3Uidm9a8fNR_0snRPwKWZ8rqdjnSl

if you want to know how to use this key , please check  this guidance.

How to use

1. Get html text by UrlFetchApp, PhantomJsCloud (if require script running to get html) or other tools.

2. make Html class by the html text like below.
(insert html text string to argument.)

var html = Html.parse({htmlText}); //get Html class
console.log(html.tree()); //check the structure of the html and get XPath of each Element. 

3. use Html class methods and get Element you want to reach.
(insert XPath string to argument. for example: ‘/body/div/header[1]/div’)

var elm = html.getElmX({XPath}); //get Element class

4. use Element class methods and get detail of this Element.

var tagName = elm.tag(); //get tag tagName
var attObj = elm.att(); //get attribution object.
var innerText = elm.innerText(); //get inner text (Not including inner html)

Class Html

Access all Elements included in this Html.

MethodReturn typeBrief description
tree()StringMake structural description of the whole html for programmer's convinience. It includes also xpath and attributions for each tags. when making code, you use this method to comfirm xpath and attributions of the element you want to reach.
getElmX('XPath')Elementsearch whole html and return Element correspond to 'XPath' string. If no fitting Element, return null.
getElmId('idValue')Elementsearch whole html and return Element with id attribution and the value correspond to 'idValue' string. If no fitting Element, return null.
getElmIds('idValue')[Elements]search whole html and return array which contains all Element with id attribution and the value correspond to 'idValue' string. If no fitting Element, return null.
getElmClass('clsValue')Elementsearch whole html and return Element with class attribution and the value correspond to 'clsValue' string. If no fitting Element, return null.
getElmClasses('clsValue')[Elements]search whole html and return array which contains all Element with class attribution and the value correspond to 'clsValue' string. If no fitting Element, return null.
getElmTag('tagName')Elementsearch whole html and return Element whose tag is correspond to 'tagName' string. If no fitting Element, return null.
getElmTags('tagName')[Elements]search whole html and return array which contains all Elements whose tag is correspond to 'tagName' string. If no fitting Element, return null.
getElmAtt(attObject)Elementsearch whole html and return Element having attObject's all keys in the attribution keys and each attribution value is correspond to each attObject's value. If no fitting Element, return null.
getElmAtts(attObject)[Elements]search whole html and return array which contains all Elements having attObject's all keys in the attribution keys and each attribution value is correspond to each attObject's value. If no fitting Element, return null.
html()Stringreturn whole html text acquired by structural analysis.

Class Element

Get detail of this Element and access all Elements locate directly under this Element.

MethodReturn typeBrief description
tree()StringMake structural description of the whole html for programmer's convinience. It includes also xpath and attributions for each tags. when making code, you use this method to comfirm xpath and attributions of the element you want to reach.
level()Integerreturn hierarchy depth of this Element. Usually html tag is level 1. head tag or body tag that locate directly under html tag are level 2
tag()Stringreturn tag name of this Element.
topTag()Stringreturn whole start tag html of this Element.
lastTag()Stringreturn whole end tag html of this Element.
innerHtml()Stringreturn inner html of included Elements (this Element and all Elements locate directly under this Element). Not include inner text.
innerText()Stringreturn inner text of included Elements (this Element and all Elements locate directly under this Element). Not include inner html.
att()Objectreturn Object contains all attributions information of this Element. At each key-value pair in the Object, key is attribution's name, value is attribution's value.
getElmX('XPath')Elementsearch included Elements and return Element correspond to 'XPath' string. If no fitting Element, return null.
[caution!] need only a part of XPath after this elements.
getElmId('idValue')Elementsearch included Elements and return Element with id attribution and the value correspond to 'idValue' string. If no fitting Element, return null.
getElmIds('idValue')[Elements]search included Elements and return array which contains all Element with id attribution and the value correspond to 'idValue' string. If no fitting Element, return null.
getElmClass('clsValue')Elementsearch included Elements and return Element with class attribution and the value correspond to 'clsValue' string. If no fitting Element, return null.
getElmClasses('clsValue')[Elements]search included Elements and return array which contains all Element with class attribution and the value correspond to 'clsValue' string. If no fitting Element, return null.
getElmTag('tagName')Elementsearch included Elements and return Element whose tag is correspond to 'tagName' string. If no fitting Element, return null.
getElmTags('tagName')[Elements]search included Elements and return array which contains all Elements whose tag is correspond to 'tagName' string. If no fitting Element, return null.
getElmAtt(attObject)Elementsearch included Elements and return Element having attObject's all keys in the attribution keys and each attribution value is correspond to each attObject's value. If no fitting Element, return null.
getElmAtts(attObject)[Elements]search included Elements and return array which contains all Elements having attObject's all keys in the attribution keys and each attribution value is correspond to each attObject's value. If no fitting Element, return null.
html()Stringreturn included html text acquired by structural analysis.

コメントを残す

メールアドレスが公開されることはありません。 * が付いている欄は必須項目です