Forum Moderators: open

Message Too Old, No Replies

"Best" way to archive data in MySQL / MariaDB

         

csdude55

1:28 am on Nov 17, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm running MariaDB 10.3.32, which I understand is more or less identical to MySQL.

I have a table that's 2.5G in size (2.1G data, 433M index) and has 10.4 million rows. A lot of this data can be safely archived away, though, which I'm hoping will make the main table process queries faster.

How would you suggest moving it in the fastest to process method possible? This is a live table that has inserts and selects 24/7, so I really can't justify any real amount of down time.

My initial thought was to create "archive" with the same structure, then run a query like:

insert into archive
select * from tableA where date <= 20201117000000;

delete from tableA where date <= 20201117000000;


But I think that would probably be the SLOWEST to process! It might even time out on me.

My next thought... copy /var/lib/mysql/example/tableA.frm, tableA.MYD, and tableA.MYI to a /backup directory, rename the files to archive.frm/MYD/MYI, then move them back to /var/lib/mysql and overwrite the existing archive files.

Then go to phpMyAdmin and delete anything in "archive" that I don't want stored in the archive, then delete the archived info from tableA:

delete from archive where date > 20201117000000;
delete from tableA where date <= 20201117000000;


This might still be a lot slower than I want, and I don't love that there's no safety in place to make sure that I'm not deleting something that's not archived.

Any other suggestions?

brotherhood of LAN

10:02 am on Nov 17, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Have a look at partitions. You could potentially partition on date and truncate the older partitions rather than deleting with a WHERE statement.

Nanncy

4:52 pm on Nov 17, 2021 (gmt 0)



What about a data loss bug that the entire database breaks down

csdude55

4:52 pm on Nov 17, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Very interesting, @brotherhood_of_LAN! I'm reading up on it now. I've never done it before so I'm sure I'll have questions.

In your experience, how long would it take to process the creation of the new partition on a table this size (2.1G of data, 433M of index)?

@Nanncy, what do you mean?

brotherhood of LAN

9:06 am on Nov 18, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>how long would it take to process the creation of the new partition on a table this size (2.1G of data, 433M of index)?

I haven't used them in a good number of years, IIRC the documentation on them had an example very similar to your situation. Not sure if I'd create partitions directly on your live table, would test on a separate DB and when all is well migrate your logic/data and rename the tables to use the partitioned one.

For importing with just PK only it shouldn't take much longer than the speed limitations of the disk, then add the indexes when the data is imported.

All really depends on how you use the table. If it's pruning older data then perhaps it'd be a case of exporting to your archive table, pruning the old partition and creating a new one. You can create partitions on the modulus of the day of the year, for example.

csdude55

5:44 pm on Nov 18, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



After giving it some thought, your suggestion on partitions has me going in a different direction. I think I'll create a new thread about partitions, I'd really appreciate it if you reply to it when you get a chance.

federicorazzoli

8:23 pm on Dec 15, 2021 (gmt 0)



I see you're worried about the duration of the operation. I think it shouldn't be a concern, so I'll focus on that point (you already got good advice on what to do, anyway).

Don't you have a replica? You should, for many reasons. If you do have one, you can do something like this:

* Create partitions / copy rows / whatever in the replica (R1)
* Create a replica of the replica (M -> R1 -> R2) for safety reasons
* Wait until Seconds_behind_master = 0 in R1 to minimise conflict risks
* Point your applications to R1, don't use M anymore
* When you're sure that everything works, wait a couple of day to be prudent, and than decommission M