Text Extract API Documentation

Text Extract API Implementation instructions

Text Extract API, used for the extraction of text contant from HTML pages, uses a JSON-based RPC format and can be accessed trough GET and POST interfaces over the world wide web. The API endpoint is located at: 

http://api.ai-applied.nl/api/text_extraction_api/

Note: All our API's allow demo and testing use for free. You can request an API key to test our API's using the form below, after which an API key with 5000 credits will be e-mailed to the provided e-mail address. The use of the API beyond this limit requires a commercial account and a API key. We will never share your e-mail address with anyone, and use it only to prevent the abuse of the demo functionality. At any time you can upgrade your demo key by adding purchased credits to it, or by switching to a subscription plan.

The API call can be passed in a JSON format using the POST of GET command:  

request={
"data":{
"api_key":"DEMO_ACCOUNT",
"call":{
"data":[
{
"body":"<html><head><title>Example page</title></head><body><div>This is a example webpage with text.</div></body></html>"
},
{
"url":"http://www.bbc.co.uk/news/business-22430919"
}
]
}
}
}

The API call contains the following parameters:

ParameterObligatoryDescription
dataYestransport container for the API call
api_keyYesyour key for the use of this API
callYesthe API call information container
dataYesa list of JSON-dictionary formatted messages, or a link to a data source API returning such formatted messages (see API nesting documentation)

All messages passed in the final data parameter need to be JSON formatted like one of the following examples:

{
"body":"<html><head><title>Example page</title></head><body><div>This is a example webpage with text.</div></body></html>"
}

or

{
"url":"http://www.bbc.co.uk/news/business-22430919"
}

Every message  consists of the following parameters:

ParameterObligatoryDescription
bodyNo (if url is provided)a full HTML document, the contents of which are to be processed by the system.

NOTE: Either body or url always MUST be provided
urlNo (if body is provided)the URL to a web-page which is to be fetched and then processed by the system.

NOTE: Either body or url always MUST be provided
idYesa unique message ID as a string or an integer

Sample call and response interpretation

This example requests from the Text Extract API the extracted texts for the two example documents, one provided in full, and one retrieved from a URL. You can copy-paste the code below into your browser URL bar, or click here, to view the API response: 

http://api.ai-applied.nl/api/text_extraction_api/?request={
"data":{
"api_key":"DEMO_ACCOUNT",
"call":{
"data":[
{
"body":"<html><head><title>Example page</title></head><body><div>This is a example webpage with text.</div></body></html>"
},
{
"url":"http://www.bbc.co.uk/news/business-22430919"
}
]
}
}
}

This call returns the following JSON formatted message:  

{
"status":1,
"id":null,
"response":{
"data":[
{
"url":"",
"text":"This is a example webpage with text.",
"title":"Example page"
},
{
"url":"http://www.bbc.co.uk/news/business-22430919",
"text":" \n\n7 May 2013\n\nLast updated at \n\n13:57 GMT\n\nShare this page\n\nDelicious\n\nDigg\n\nFacebook\n\nreddit\n\nStumbleUpon\n\nTwitter\n\nEmail\n\nPrint\n\nHSBC profits almost double to $8.4bn as bad loans fall\n\nHSBC is confident about future growth in the US and China\n\nContinue reading the main story\n\nBig Banking\n\nRevenues drop 20% at Goldman Sachs\n\nJPMorgan fined $100m for trade loss\n\nBank of America earnings surge\n\nCitigroup profits in slight fall\n\nHSBC almost doubled pre-tax profits to $8.4bn (\u00a35.4bn) in the first three months of 2013 after trading conditions in the bank's key markets improved.\n\nThe rise, an increase of 95% on the same quarter in 2012, came as HSBC reported a big fall in losses from bad debts and provisions for other risks.\n\nLoan impairment charges fell 51% to $1.17bn, with the fall most notable in the US, HSBC said.\n\nChief executive Stuart Gulliver said the US would continue to strengthen.\n\nIn China, after a slower start to the year, Mr Gulliver said he expected the economy \"to accelerate\" during 2013.\n\nCutting costs\n\nSince taking over in early 2011, Mr Gulliver has been trying to streamline operations, reduce complexity and cut divisions that are unprofitable. HSBC has sold or closed 52 businesses since he became chief executive.\n\n\"We have strengthened our capital position and remain one of the best-capitalised banks in the world, allowing us both to invest in organic growth and grow dividends,\" the company said in a statement.\n\nContinue reading the main story\n\nHSBC Holdings\n\nLast Updated at 23 Oct 2013, 13:11 GMT\n\n*Chart shows local time\n\nprice\n\nchange\n\n%\n\n676.60 p\n\n-\n\n-11.10\n\n-\n\n-1.61\n\nIn March, HSBC, which has eliminated about $3.6bn of costs, said there was room for a further $1bn in savings this year.\n\nCosts in the first quarter were down 10% from a year ago, and now consist of about 53% of income. The bank is aiming to get the percentage below 52% by the end of the year. \n\nAcross Europe, smaller rivals are also cutting back, with French banks Societe Generale and Credit Agricole on Tuesday saying they must keep cutting costs to help offset a weak domestic economy. \n\nRichard Hunter, head of equities at Hargreaves Lansdown Stockbrokers, said: \"Set against a mixed bag of trading updates so far from its peers, HSBC has delivered a statement which not only ticks all of the boxes, but propels the bank to premier status in the sector.\"\n\nThe profit figures were higher than many analysts had forecast, and HSBC shares were up almost 3% in afternoon trading.\n",
"title":"BBC News - HSBC profits almost double to $8.4bn as bad loans fall"
}
],
"description":"OK: Call processed.",
"success":true
}
}

Following parameters in API reply belong to the transport layer, and can be stripped away in case of success:   

ParameterDescription
statushas value 1 if the transport of the message has been conducted succesfully trough all systems
ida optional callback parameter that can be ignored for this API
responsecontains the response from the API extractor

The "response" parameter contains this API's reply and consists of the following subparameters:

ParameterDescription
successindicates whether the API call has been completed with success (==true), or has failed (==false)
descriptiongives a description of the API call's success or failure as a string
datacontains a list of extracted texts provided by the API in a JSON format

All messages returned in the data parameter are JSON formatted like the following example:

{
"url":"",
"text":"This is a example webpage with text.",
"title":"Example page"
}

The returned messages always contain the following parameters: 

ParameterDescription
urlthe url of the document that the text has been extracted from, if known
titlethe title of the document that the text has been extracted from, if known
textthe text extracted from the document by the Text Extract API

The results obtained from the Text Extract API can also be used as a part of further message processing by our API's, such as the Sentiment Analysis API.

If you need more assistance with the implementation of this API, please don't hesitate to comment below, or contact us for assistance!