Forum Moderators: phranque

Message Too Old, No Replies

Getting around JavaScript to download a file

         

csdude55

3:07 am on Jan 5, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm working on a system that uses a ZIP file found on a government-owned public website. By law, the data is free to the public and can be used for commercial purposes, but the state is more or less trying to discreetly discourage that by making the data hard to access.

On their website, the link to download the file is:


<form name="_ctl0" method="post" action="stats.aspx" id="_ctl0">
<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="[random 3,169 digit code ending with ==, probably a key]" />

<script type="text/javascript">
<!--
var theForm = document.forms['_ctl0'];
if (!theForm) {
theForm = document._ctl0;
}
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
theForm.__EVENTTARGET.value = eventTarget;
theForm.__EVENTARGUMENT.value = eventArgument;
theForm.submit();
}
}
// -->
</script>

<input type="hidden" name="__VIEWSTATEGENERATOR" id="__VIEWSTATEGENERATOR" value="AEA4A7A6" />

<a id="datadownloadbutton" href="javascript:__doPostBack('datadownloadbutton','')">here</a>
</form>


Once you click the "here" link, the script loads a ZIP file to be downloaded to your PC. But what I need is to copy this file from their server to mine each day using a CRON.

I emailed the state webmaster and asked for a direct link. He replied that, while the information is public record, the only way they can give to download it is through the website. His recommendation was to manually download it each day then upload it to my server, which is obviously a pain.

I tried to create the form using GET parameters, like so:


http://www.example.com/stats.aspx?__VIEWSTATEGENERATOR=AEA4A7A6&__VIEWSTATE=[url encoded code]


but it just gave me an error page. So I'm assuming the ASPX file is specifically looking for POST parameters, not GET. But it could also be that the 3,169 digit code is just too long to be sent through a browser.

So, can you guys think of a way that I can work around this and find the path to the zip file on their server?

TIA!

phranque

10:21 am on Jan 5, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



there's no guarantee the resource is directly accessible to a web request and the information given is probably insufficient to find the answer.
have you tried looking for the filename in google's cache?
something like this search might help find the path:
site:example.com allinurl:"filename.filetype"

csdude55

10:56 am on Jan 5, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks for the tip, phranque! I wasn't aware of the "allinurl" option, so that's a good one to remember.

Unfortunately, though, that search didn't find anything. With it being on a government-owned domain, though, there's no guarantee that the file would be on the same domain as stats.aspx, anyway.

The ZIP file has a somewhat unique name, though, and I searched Google for any reference to that exact file. It returned only one reference, but it didn't include a link, just a discussion note that didn't help.

robzilla

6:42 pm on Jan 5, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The long string is probably a session ID (the == ending suggests BASE64 encoding), so you'll again have to use something like CURL to first make an HTTP request to the page that contains the form, record the session ID, have CURL keep the session alive (with a cookie), and then make a POST request to stats.aspx using that session ID/cookie to get to the ZIP file. Use your browser's developer tools to find all the variables that go into that POST request.

csdude55

3:21 am on Jan 6, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I tried to use the Live HTTP Headers add-on to make a POST request and emulate it, but all I got was a 404 error. I've only used CURL twice (and one of those times was what we talked about recently) so your suggestion is just a little out of my range of expertise, but I'll see what I can figure out.

I can't understand why they put the session ID in the <input> tag, though. That would have already been stored as a cookie, wouldn't it? If they were trying to keep people from downloading without a session cookie then it seems like they could have just left it out and made it harder for me to guess. Then again, maybe I should just shut up in case they read this! :-P

I've gone through every program and extension I can think of to find the actual path to the ZIP file with no luck. There HAS to be a way because there's at least one other larger website that uses the same data, and I can't imagine that they're manually downloading and uploading every day. Maybe they're doing something via CURL, too. Or maybe they've made a financial arrangement with the state in an attempt to monopolize on the data.

Either way, wish me luck! :-)

tangor

6:04 am on Jan 6, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Meanwhile, if data necessary, do the manual and keep up to date!

Also remember that gubermints (sic) like google are black boxes often difficult to decipher. :)

Oh... might try looking at the site without js. You might be surprised.

jmccormac

8:00 am on Jan 6, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Perhaps use something like PhantomJS?

[phantomjs.org...]

Regards...jmcc

londrum

10:23 am on Jan 6, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've gone through every program and extension I can think of to find the actual path to the ZIP file with no luck

they are probably storing it outside of the root, to stop people accessing it directly by URL from the outside. I do that for a few of the files on my site as well.

robzilla

3:42 pm on Jan 6, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If the session is still alive and you emulate the exact POST request, the server shouldn't be able to know the difference, but I'm not too familiar with ASP.NET.

A Google search for "__VIEWSTATE" download file might give you some new ideas.

londrum

3:59 pm on Jan 6, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The php script i send my file through checks for stuff like the referrer as well, to make sure the request comes from a single page on my site, so maybe they're checking for something like that