Forum Moderators: open
I was wondering if someone could help me with a design of a database. Everyday I run certain scripts that update the database with certain information. This information is recorded in my tables with a "date" attached to every entry. The problem with this is that this information is usually the same from day to day, and is just copied in my tables with a new "date" field.
Although this is working for me right now, as you may imagine, the database fills up relatively quickly. I was wondering if it is possible to "track" the changes to the current data everyday, without having to copy the non-changing data everytime.
I, however, want to be able to run queries on the data at any point in history. For example, I want to be able to view the "status" of the database in say Dec 1st, 2005.
If you guys have any suggestions please let me know,
Thanks
Say your tables are now:
locationanddate
-----------
place,date
-----------
us,20080520
us,20080521
uk,20080522
A normalized version of this would be:
location
--------
id_location,place
1,us
2,uk
and a second table
location_dates
----------------
id_location,date
----------------
1,20080520
1,20080521
2,20080522
In this example it looks trivial, but if you have a lot of information rather than just us/uk then it will save space, etc.
Use joins to get at the data.
[databasejournal.com...]
[dev.mysql.com...]
I think I am normalizing the tables, but the problem is slightly different. Take this for example:
user_info table:
user_id
name
info
etc. etc...
location table:
location_id
location
user_location:
user_id
location_id
date
for the user_location table I need to keep history of the locations of the user for each date, and need to be able to query it at any time.
Since there are a lot of users that I update everyday, that table grows very very quickly.
Thanks for the help
Perhaps I am getting the primary key / index structure for the tables all wrong. Would you be able to suggest to me which fields to index, which to make primary / foreign keys, etc.
Maybe that will make my queries faster.
Thanks for the help
To address the storage size issue -
You could make the user_location table hold columns:
user_id
location_id
from_date
to_date
but it would make your queries more complex and I have had query performance issues with this sort of structure before
I also came across this idea:
Have a base table with the "current" data, and a history table that logs all the changes. For example:
Base Table:
pk
data
History Table:
pk
timestamp
oldvalue
change (update, delete, or insert)
When you update, delete, or insert records in the base table, the history table is modified.
The problem I see with this is querying the data. Say you want to query data that is 3 years old. Would it be possible to restore the base table (or create a temp table) to the same status it was 3 years ago and then run queries against it?
Would this idea be inefficient?
Any suggestions are appreciated, Thanks.