0

I need to fetch lessons from an online timetable(for a school) into an array. So i can insert the data into my database. The online timetable(url: roosters-hd.stenden.com) looks like this:

On the left hand we see the times, and on top the schooldays (Mo, Tu, We, Th, Fr). Very basic.

Each lesson contains 6 values that i need to fetch.

Besides that, I also need to fetch the [startDate] and [endDate]. Time is based on which row the lesson-cell is at, and how much rowspan it has. Date can be calculated by adding column number onto the start date(printed on top). So in the end the array would look something like this:

[0] => Array
        (
            [0] => Array
                (
                    [Name] => Financiering
                    [Type] => WC
                    [Code] => DECBE3
                    [Classroom] => E2.053 - leslokaal
                    [Teacher] => Verboeket, Erik (E)
                    [Class] => BE1F, BE1B, BE1A
                    [StartDate] => 04/06/2013 08:30:00
                    [EndDate] => 04/06/2013 10:00:00
                )
                etc.

Because my lack of experience in fetching data, I will properly end up with a highly inefficient and inflexible solution. Like should i use XML-parser? Or Regex? Any ideas on how to tackle this problem?

3
  • please not regex! stackoverflow.com/a/1732454/2170192 Commented Jul 11, 2013 at 19:49
  • yes not regex, regex is for parsing strings it is very powerful but still it should not be used for this kind of parsing. Also link you posted returns 400 bad request. It would be good to see live example, you can put it in jsfiddle.net Commented Jul 11, 2013 at 19:53
  • Fixed link. I don't have any example right now, since i'm not sure where i should start. With that i mean, the correct efficient way of fetching the data. Commented Jul 11, 2013 at 19:56

1 Answer 1

2

The regex way:

<pre><?php
$html = file_get_contents('the_url.html');

$clean_pattern = <<<'LOD'
~
  # definitions
    (?(DEFINE)
        (?<start>         <!--\hSTART\hOBJECT-CELL\h-->                    ) 
        (?<end>           (?>[^<]++|<(?!!--))*<!--\hEND\hOBJECT-CELL\h-->  )

        (?<next_cell>     (?>[^<]++|<(?!td\b))*<td[^>]*+>  ) 
        (?<cell_content>  [^<]*+                           )
    )

  # pattern
    \g<start>
        \g<next_cell>     (?<Name>      \g<cell_content>   )  
        \g<next_cell>     (?<Type>      \g<cell_content>   )
        \g<next_cell>     (?<Code>      \g<cell_content>   )

        \g<next_cell>     (?<Classroom> \g<cell_content>   )
        \g<next_cell>

        \g<next_cell>     (?<Teacher>   \g<cell_content>   )
        \g<next_cell>     
        \g<next_cell>     (?<Class>     \g<cell_content>   )
    \g<end>
~x
LOD;

preg_match_all($clean_pattern, $html, $matches, PREG_SET_ORDER);

foreach ($matches as $match) {
    echo <<<LOD
    Name: {$match['Name']}
    Type: {$match['Type']}
    Code: {$match['Code']}
    Classroom: {$match['Classroom']}
    Teacher: {$match['Teacher']}
    Class: {$match['Class']}<br/><br/>
LOD;
}

The DOM/XPath way:

$doc = new DOMDocument();
@$doc->loadHTMLFile('the_url.html');
$xpath = new DOMXPath($doc);
$elements = $xpath->query("//*[comment() = ' START OBJECT-CELL ']");
$fields = array('Name', 'Type', 'Code', 'Classroom', 'Teacher', 'Class');
$not_needed = array(10,8,6,1,0);    
foreach ($elements as $element) {
    $temp = explode("\n", $element->nodeValue);
    foreach ($not_needed as $val) { unset($temp[$val]); }
    array_walk($temp, function (&$item){ $item = trim($item); });
    $result[] = array_combine($fields, $temp);
}   
print_r ($result);
Sign up to request clarification or add additional context in comments.

5 Comments

I tried your raw pattern in Rubular, but it doesn't seem to match anything. rubular.com/r/xwfwYKy13S .
@JasperJ: rubular is for ruby not for php, the best test you can do is IN YOUR CODE! Otherwise, you can use regex.larsolavtorvik.com which is designed for php.
Right, stupid me. I tried preg_match_all($raw_pattern, $data, $out); With data being file_get_content from url. But still no success (php 5.3.26). But I will wait for updates.
I like your xPath version. Still I need to be able to calculate start and end date. Is there any way to fetch: column name/number, row name/number and rowspan of the cell?
@JasperJ: I have just see that you need start and end date time.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.