Forum Moderators: phranque
I have an ecommerce database I'm trying to build in Excel for our website. I have never made a database-driven website before. We currently have a website with nearly 800 different product pages that we built manually using FrontPage 2003. We need to change our site completely so that it is database-driven / dynamically builds the product pages. I'm using Excel 2007 to at least try to compile the database, since I've already been using a large Excel workbook to keep track of product pricing, shipping costs, etc. I don't know PHP or ASP just yet, and I'm not sure which one I'll end up using. So I'm starting with what I know how to do - I'm decently familiar with using Excel. And I know I need to build the database at the very least before I can have someone help me figure out how to use PHP or ASP and pull from the database. I want to have the database built in Excel, then exported to CSV and uploaded to a MySQL database online that the PHP or ASP Programming pulls from to build the product pages.
As mentioned, I already have a website now (built using FrontPage 2003, but I've already ordered the upgrade for Expression Web 2 that I should receive very soon). For our current site, we have manually created every product page. I need to convert all those pre-made product pages to have their info. in this same database where I already have the items listed by rows with columns for each of the important things (Long Description with Bullets, Short Description, Title, Price, Shipping Charge, URL for large image, URL for small image, etc.).
I need a way to automate pulling data from our current site's HTML and putting it into an Excel or csv or txt file. There is a lot of info. in the HTML files I don't need (tables info., includes pages, etc.). I would especially need to pull the item title, item description, the bullets with their <ul><li></li></ul> tags (some products have 15-20 bullets of info.), the image URLs, and if possible, the metatag data I've already made.
The only way I know how to do this is to go into FrontPage or our site online for every single product page we already have made and copy / paste the product description into a cell in Excel and then copy / paste the html code for the product's bulleted list into the formula bar of Excel for the next cell of that row, and do the same for the images and the metatag data. Our current website has nearly 800 product pages right now. This will take forever to do it manually. Does anyone know of a way I can automate some or all of this giant task? (currently NOT using a DWT - every page was created manually). Any info. at all will be greatly appreciated. Thank you!
[edited by: phranque at 6:26 am (utc) on July 19, 2008]
[edit reason] No urls, please. See TOS [webmasterworld.com] [/edit]
Failing that, FrontPage and Expression Web aren't going to be any help on their own. This isn't so much a web design problem as a data manipulation problem. The source data is your web pages treated as text.
If the existing pages are similar to each other, the key to success is going be regular expressions that allow you to extract the data you want and leave behind what you don't.
The method that would give you complete control is to write a C++ program to do the extraction. The .NET Framework (library) provides good regular expressions capabilities. It would scan all your site folders for .htm files, use regular expressions to extract the bits you want from each file, and write the data to a tab-delimited text file which you would then import to Excel. The same could also be done in Visual Basic or C# (both using .NET), and even Perl or PHP (using their facilities, not .NET). You can get Express Editions of Microsoft C++, C#, and VB free online.
Before starting to write the program, you could experiment with a tool like GREP to test how successfully regex can be used to pull the desired data out of the files. Developing the correct regex will be the key, and the tricky part.
Edit:
A 2nd alternative that uses regex differently:
Make a copy of the website, and use a long series of FP or EW regular expressions search and replaces to remove from every file all the content that you don't need to keep. When you're finished, each file will only contain data destined for the database.
That still leaves the problem of getting the data into tab-delimited format, and then copying it from the multiple source files into one big text file.
If there is a Windows equivalent of the Linux SED program, that might do the reformatting, and a series of COPY commands (Windows or Linux) would concatenate the files into one big file.
Anyway, those are at least the types of solutions I think are called for here.
[edited by: SteveWh at 10:12 pm (utc) on July 19, 2008]
[edited by: eelixduppy at 11:35 pm (utc) on Oct. 26, 2008]
[edit reason] removed URL [/edit]