OK, so the story is like this:
- I have many files (beautiful < / P>
- These files are constantly updated with data, sometimes new, sometimes the same data
in a datastore <
- which Since time it's currently - I'm trying to find an algorithm how can I find out that something has changed for a particular line in the file, then update the database To minimize time,
now works that I release all the data in the database every time and then import it again, but it will not work anymore because when an item
- The file contains strings and numbers (title, order, prices etc.)
I could have thought that there are only one solution:
P>
- Calculate a hash for each line from the database , That is compared to that of the hash of the line from the file and if they are different from the update database
- keep two copies of the files, the previous ones and the current ones, and diffs on it (which is probably DB Are faster than updating) and DB based on those updates
Since the amount of data is huge, I am out of choice for now. For a long time, I will get rid of the files and the data will be sent directly to the database, but the problem still remains.
Any advice will be appreciated.
definition of problem of understanding .
Assume that in your file
as you have said that the row can be added / updated, so file2, Tim, 35 - 3 to update, Kim, 40 - 4 to leave, Jim, 30 - to enter
Now to update the database by inserting / updating only two records containing two records or two SQL statements containing two records or 1 batch query in two SQL queries.
I am creating assumptions here
- the following file, you do not modify the existing process to create the file Can do
- You are using some batch processing [file to upload data in the database - writing in memory - writing in DB].
Record against id in an id-memory map [name, age] where id is the key and value hash [if you use scalability] healcast]
To load the data, your batch framework [again a record file is considered as a record], the in-memory map will have to check the math hash value against the id. The first time can be created using your batch framework to read the file.
if (id exists) --- compare hash --- find only one, then discard it - create an updated SQL item in case the id does not exist - my hash, Create a inserted SQL and insert the hashvalue You can go to partition memory data by using parallel processing, chuck processing and spring batch and hezelcast.
Hope it helps
Comments
Post a Comment