88829

How to modify a single file inside a very large zip without re-writing the entire zip?

<h3>Question</h3>

I have large zip files that contain huge files. There are "metadata" text files within the zip archives that need to be modified. However, it is not possible to extract the entire zip and re-compress it. I need to locate the target text file inside the zip, edit it, and possibly append the change to the zip file. The file name of the text file is always the same, so it can be hard-coded. Is this possible? Is there a better way?


<h3>Answer1:</h3>

There are two approaches. First, if you're just trying to avoid recompression of the entire zip file, you can use any existing zip utility to update a single file in the archive. This will entail effectively copying the entire archive and creating a new one with the replaced entry, then deleting the old zip file. This will not recompress the data not being replaced, so it should be relatively fast. At least, about the same time required to copy the zip archive.

If you want to avoid copying the entire zip file, then you can effectively delete the entry you want to replace by changing the name within the local and central headers in the zip file (keeping the name the same length) to a name that you won't use otherwise and that indicates that the file should be ignored. E.g. replacing the first character of the name with a tilde. Then you can append a new entry with the updated text file. This requires rewriting the central directory at the end of the zip file, which is pretty small.

(A suggestion in another answer to not refer to the unwanted entry in the central directory will not necessarily work, depending on the utility being used to read the zip file. Some utilities will read the local headers for the zip file entry information, and ignore the central directory. Other utilities will do the opposite. So the local and central entry information should be kept in sync.)


<h3>Answer2:</h3>
<blockquote>

There are "metadata" text files within the zip archives that need to be modified. However, it is not possible to extract the entire zip and re-compress it.

</blockquote>

This is a good lesson why, when dealing with huge datasets, keeping the metadata in the same place with the data is a bad idea.

The .zip file format isn't particularly complicated, and it is definitely possible to replace something inside it. The problem is that the size of the new data might increase, and not fit anymore into the location of the old data. Thus there is no standard routine or tool to accomplish that.

If you are skilled enough, theoretically, you can create your own zip handling functions, to provide the "file replace" routine. If it is about the (smallish) metadata, you do not even need to compress them. The .zip's "central directory" is located in the end of the file, after the compressed data (the format was optimized for appending new files). General concept is: read the "central directory" into the memory, append the new modified file after the compressed data, update the central directory in memory with the new file offset of the modified file, and write the central directory back after the modified file. (The old file would be still sitting somewhere inside the .zip, but not referenced anymore by the "central directory".) All the operations would be happening at the end of the file, without touching the rest of the archive's content.

But practically speaking, I would recommend to simply keep the data and the metadata separately.

来源:https://stackoverflow.com/questions/34258649/how-to-modify-a-single-file-inside-a-very-large-zip-without-re-writing-the-entir

Recommend

  • Non-greedy regex acts greedily
  • Transform list of objects into csv using dataweave
  • How to change location of ValidationMessages.properties in Bean Validation
  • Controling HTML5 video with jQuery
  • How to install PHP pthreads in cpanel?
  • How to recreate a virtual env in python
  • PhoneGap FileReader/readAsDataURL Not Triggering Callbacks
  • Android How to call a method multiple times with a delay between them
  • XCode 5 crash on loading the project
  • Integrating Yelp API v2 into iOS 7 app
  • How to apply CSS to document.write()?
  • How to set title name of the pdf. While viewing the Document(New Tab)
  • Getting an error serving images from App_Themes when using precompilation?
  • How to hide 'Add To Cart' for variable products, but keep product variations visible
  • Limit # of records returned based on a form control
  • Project Euler -Prob. #20 (Lua)
  • get all files in git diff in intellij
  • MySQL - Filter records which date is biggest
  • OpenCV::solvePNP() - Assertion failed
  • Detect when MathJax has finished loading in UIWebView
  • Sorting Custom Listview Items Using Spinner Android
  • Refresh JSF component after custom javascript Ajax call
  • Spring annotation @Order
  • How do I change the names of buttons on a message box?
  • Validating my form with Jquery
  • Unity Resources.load() won't work with external dll
  • Unable to start a WebView from an AsyncTask
  • playing mp3 from nsbundle
  • How to check if a database and tables exist in sql server in a vb .net project?
  • How to compile gSoap with ssl enabled on windows?
  • java.io.FileNotFoundException: Could not open ServletContext resource [/WEB-INF/SpringDispatcher-ser
  • Can I read another applications memory?
  • Creating 2d platforms using JavaScript
  • How to define something in JavaScript [closed]
  • DataTables move rows between tables
  • Google App Engine backend servlet not responding
  • Make checkout phone field optional for specific countries in WooCommerce
  • Excel VBA : conditional formatting of sheet1 cells from sheet2 values in excel 2007
  • Why does Rails 3 think xE2x80x89 means â x80 x89
  • How to use FirstOrDefault inside Include