The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Web Design
    4. Writing A Data Extraction To Web Page Program

    Writing A Data Extraction To Web Page Program

    Web Design
    3 3 187
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • KempRugeLawGroup
      KempRugeLawGroup last edited by

      In my area, there are few different law enforcement agencies that post real time data on car accidents. One is http://www.flhsmv.gov/fhp/traffic/crs_h501.htm. They post the accidents by county, and then in the location heading, they add the intersection and the city. For most of these counties and cities, our website, http://www.kempruge.com/personal-injury/auto-and-car-accidents/ has city and county specific pages. I need to figure out a way to pull the information from the FHP site and other real time crash sites so that it will automatically post on our pages. For example, if there's an accident in Hillsborough County on I-275 in Tampa, I'd like to have that immediately post on our "Hillsborough county car accident attorney" page and our "Tampa car accident attorney" page.

      I want our pages to have something comparable to a stock ticker widget, but for car accidents specific to each pages location AND combines all the info from the various law enforcement agencies. Any thoughts on how to go about creating this?

      As always, thank you all for taking time out of your work to assist me with whatever information or ideas you have. I really appreciate it.

      1 Reply Last reply Reply Quote 0
      • CleverPhD
        CleverPhD last edited by

        You need to get a developer who understands a lot about http requests.  You will need to have one that knows how to basically run a spidering program to ping the website and look for changes and scrape data off of those sites.  You will also need to have the program check and see if the coding on the page changes, as if it does, then the scraping program will need to be re-written to account for this.

        Ideally, those sites would have some sort of data API or XML feed etc to pull off of, but odds are they do not.  It would be worth asking, as then the programming/programmer would have a much easier time.   It looks like the site is using CMS software from http://www.cts-america.com/  - they may be the better group to talk to about this as you would potentially be interfacing with the software they develop vs some minion at the help desk for the dept of motor vehicles.

        Good luck and please do produce a post here or a YouMoz post to show the finished product - it should be pretty cool!

        1 Reply Last reply Reply Quote 1
        • EGOL
          EGOL last edited by

          1. Write a Perl program (or other language script) that will: a) read the target webpage, b) extract the data relevant for your geographic locations, c) write a small html file to your server that formats the data into a table that will fit on the webpage where you want it published.

          2. Save that Perl program in your /cgi-bin/ folder.   (you will need to change file permissions to allow the perl program to execute and the small html file to be overwritten)

          3. Most servers allow you to execute files from your /cgi-bin/ on a schedule such as hourly or daily.  These are usually called "cron jobs".  Find this in your server's control panel.   Set up a cron job that will execute your Perl program automatically.

          4. Place a server-side include the size and shape of your data table on the webpage where you want the information to appear.

          This set-up will work until the URL or format of the target webpage changes.  Then your script will produce errors or write garbage.  When that happens you will need to change the URL in the script and/or the format that it is read in.

          1 Reply Last reply Reply Quote 2
          • 1 / 1
          • First post
            Last post
          • We added hundreds of pages to our website & restructured the layout to include 3 additional locations within the sub-pages, same brand/domain name. How long could Google take to crawl/index the new pages and rank the keywords used within those pages?
            Roman-Delcarmen
            Roman-Delcarmen
            0
            2
            48

          • Ecommerce web design read more toggle vs menu link on home page and product pages
            0
            1
            84

          • Avg Page Load Time Increase After Responsive Web Design
            TimHolmes
            TimHolmes
            1
            6
            385

          • Our on page blog is off page! What?
            Mike.NW
            Mike.NW
            0
            2
            115

          • Lots of Listing Pages with Thin Content on Real Estate Web Site-Best to Set them to No-Index?
            Kingalan1
            Kingalan1
            0
            11
            1.2k

          • Why is this interior page outranking my front page?
            JarnoNijzing
            JarnoNijzing
            0
            3
            105

          • For a web design firm, should i make a google plus local page or company page?
            OrionGroup
            OrionGroup
            0
            2
            370

          • Any discussions on the actual web page design and how it might affect SEO?
            theideapeople
            theideapeople
            1
            7
            976

          Get started with Moz Pro!

          Unlock the power of advanced SEO tools and data-driven insights.

          Start my free trial
          Products
          • Moz Pro
          • Moz Local
          • Moz API
          • Moz Data
          • STAT
          • Product Updates
          Moz Solutions
          • SMB Solutions
          • Agency Solutions
          • Enterprise Solutions
          • Digital Marketers
          Free SEO Tools
          • Domain Authority Checker
          • Link Explorer
          • Keyword Explorer
          • Competitive Research
          • Brand Authority Checker
          • Local Citation Checker
          • MozBar Extension
          • MozCast
          Resources
          • Blog
          • SEO Learning Center
          • Help Hub
          • Beginner's Guide to SEO
          • How-to Guides
          • Moz Academy
          • API Docs
          About Moz
          • About
          • Team
          • Careers
          • Contact
          Why Moz
          • Case Studies
          • Testimonials
          Get Involved
          • Become an Affiliate
          • MozCon
          • Webinars
          • Practical Marketer Series
          • MozPod
          Connect with us

          Contact the Help team

          Join our newsletter
          Moz logo
          © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
          • Accessibility
          • Terms of Use
          • Privacy