Jump to content
New Reality: Ads For Members ×

Get product information from html source - regex


hamidjoukar

Recommended Posts

When I read HTML source of below link

http://www.dresslink.com/women-candy-color-handbag-leather-cross-body-shoulder-bag-bucket-bag-p-10908.html

I can find below data about the product:

<script type="text/javascript">
item.stock['ss42356']=[];
DL.item.stock['ss42356']['qty']=56;
DL.item.stock['ss42356']['sku']='SV000837_B';
DL.item.stock['ss42356']['inexistence']=0;
DL.item.stock['ss42356']['down_shelf']=0;
DL.item.stock['ss42356']['procurement_cycle']='8';
DL.item.stock['ss42356']['paid_set']=[];
DL.item.stock['ss42356']['paid_set'].push(35630);
DL.item.color_image['35630']='of7ea7';
DL.item.stock['ss42357']=[];
DL.item.stock['ss42357']['qty']=29;
DL.item.stock['ss42357']['sku']='SV000837_G';
DL.item.stock['ss42357']['inexistence']=0;
DL.item.stock['ss42357']['down_shelf']=0;
DL.item.stock['ss42357']['procurement_cycle']='6';
DL.item.stock['ss42357']['paid_set']=[];
DL.item.stock['ss42357']['paid_set'].push(35631);
DL.item.color_image['35631']='of710e';
DL.item.stock['ss42358']=[];
DL.item.stock['ss42358']['qty']=14;
DL.item.stock['ss42358']['sku']='SV000837_BR';
DL.item.stock['ss42358']['inexistence']=0;
DL.item.stock['ss42358']['down_shelf']=0;
DL.item.stock['ss42358']['procurement_cycle']='17';
DL.item.stock['ss42358']['paid_set']=[];
DL.item.stock['ss42358']['paid_set'].push(35632);
DL.item.color_image['35632']='of77c1';
DL.item.stock['ss42359']=[];
DL.item.stock['ss42359']['qty']=36;
DL.item.stock['ss42359']['sku']='SV000837_O';
DL.item.stock['ss42359']['inexistence']=0;
DL.item.stock['ss42359']['down_shelf']=0;
DL.item.stock['ss42359']['procurement_cycle']='7';
DL.item.stock['ss42359']['paid_set']=[];
DL.item.stock['ss42359']['paid_set'].push(35633);
DL.item.color_image['35633']='of7136';
</script>

I need to know the quantity for each SKU, so I need to produce a simple array containing each SKU name and it's quantity like below

$a = array( 'SV000837_B' => '56',
            'SV000837_G' => '29',
            'SV000837_BR' => '14',
            'SV000837_O'  => '36',

          );
Please help me write a PHP code using regex and any other method to provide above array.
 

Try

<?php

// webpage you are scraping the javascript code from
$page_url = 'http://www.dresslink.com/women-candy-color-handbag-leather-cross-body-shoulder-bag-bucket-bag-p-10908.html';

// load the webpage into DOMDocument
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTMLFile($page_url);

// use XPath to return the second <script> element inside the <div class="dd1"> element
// this is where the javascript code containing the stock array is in the webpage
$xpath = new DOMXPath($doc);
$result = $xpath->query('//div[@class="dd1"]/script[2]');

// retrieve the node element value 
$JS_stock_array_code = $result[0]->nodeValue;

// use regex to find the qty and sku values
preg_match_all("~\[('[\w\d]+')\]\['qty'\]=(\d+);.+\[\\1\]\['sku'\]='([\w\d]+)'~", $JS_stock_array_code, $matches);

// loop through the results and define sku array
// the sku is used as the array key
// the quantity is the assigned to the sku
$skus = array();
foreach($matches[3] as $key => $sku)
{
    $qty = $matches[2][$key];
    $skus[$sku] = $qty;
}

// output $sku array
printf('<pre>%s</pre>', print_r($skus, 1));

Output for me is

Array
(
    [SV000837_B] => 49
    [SV000837_G] => 26
    [SV000837_BR] => 11
    [SV000837_O] => 35
)

Archived

This topic is now archived and is closed to further replies.



×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.