76419

fseek to a 32-bit unsigned offset

Question:

I am reading a file format (TIFF) that has 32-bit unsigned offsets from the beginning of the file.

Unfortunately the prototype for fseek, the usual way I would go to particular file offset, is:

int fseek ( FILE * stream, long int offset, int origin );

so the offset is signed. How should I handle this situation? Should I be using a different function for seeking?

Answer1:

You can try to use lseek64() (<a href="https://linux.die.net/man/3/lseek64" rel="nofollow">man page</a>)

#define _LARGEFILE64_SOURCE /* See feature_test_macros(7) */ #include <sys/types.h> #include <unistd.h> off64_t lseek64(int fd, off64_t offset, int whence);

With

int fd = fileno (stream);

Notes from <a href="https://www.gnu.org/software/libc/manual/html_node/File-Position-Primitive.html" rel="nofollow">The GNU C lib - Setting the File Position of a Descriptor</a>

<blockquote>

This function is similar to the lseek function. The difference is that the offset parameter is of type off64_t instead of off_t which makes it possible on 32 bit machines to address files larger than 2^31 bytes and up to 2^63 bytes. The file descriptor filedes must be opened using open64 since otherwise the large offsets possible with off64_t will lead to errors with a descriptor in small file mode.

When the source file is compiled with _FILE_OFFSET_BITS == 64 on a 32 bits machine this function is actually available under the name lseek and so transparently replaces the 32 bit interface.

</blockquote>

About fd and stream, from <a href="https://www.gnu.org/software/libc/manual/html_node/Streams-and-File-Descriptors.html" rel="nofollow">Streams and File Descriptors</a>

<blockquote>

Since streams are implemented in terms of file descriptors, you can extract the file descriptor from a stream and perform low-level operations directly on the file descriptor. You can also initially open a connection as a file descriptor and then make a stream associated with that file descriptor.

</blockquote>

Answer2:

After studying this question more deeply and considering the other comments and answers (thank you), I think the simplest approach is to do two seeks if the offset is greater than 2147483647 bytes. This allows me to keep the offsets as uint32_t and continue using fseek. The positioning code is therefore like this:

// note: error handling code omitted uint32_t offset = ... (whatever it is) if( offset > 2147483647 ){ fseek( file, 2147483647, SEEK_SET ); fseek( file, (long int)( offset - 2147483647 ), SEEK_CUR ); } else { fseek( file, (long int) offset, SEEK_SET ); }

The problem with using 64-bit types is that the code might be running on a 32-bit architecture (among other things). There is a function fsetpos which uses a structure fpos_t to manage arbitrarily large offsets, but that brings with it a range of complexities. Although fsetpos might make sense if I was truly using offsets of arbitrarily large size, since I know the largest possible offset is uint32_t, then the double seek meets that need.

Note that this solution allows all TIFF files to be handled on a 32-bit system. The advantage of this is obvious if you consider commercial programs like PixInsight. PixInsight can only handle TIFF files smaller than 2147483648 bytes when running on 32-bit systems. To handle full sized TIFF files, a user has to use the 64-bit version of PixInsight on a 64-bit computer. This is probably because the PixInsight programmers used a 64-bit type to handle the offsets internally. Since my solution only uses 32-bit types, I can handle full-sized TIFF files on a 32-bit system (as long as the underlying operating system can handle files that large).

Recommend

  • Qt creator. read from a file and print it out on beaggleboard
  • Making stdin writable in a safe and portable way
  • Read a single sector from a disk
  • Read a single sector from a disk
  • How do I test for a version of the libstdc++
  • R connection to postgresql requiring SSL
  • APK 0 (zero) Device compatibility
  • How to get a list of all blobs in a repository in Git
  • Merging Users in Kinvey
  • TextPad “find in files” not matching on simple OR regex
  • Render html in springfox-swagger-ui
  • Angular Databinding doesnt Work
  • Iterate twice through a DataReader
  • Should I be afraid to use UDP to make a client/server broadcast talk?
  • Cordova Apache wrong module path
  • Deploying a CodeRush plugin from the Community Site
  • Problem with Django using Apache2 (mod_wsgi), Occassionally is “unable to import from module” for no
  • Why does java tzupdater add leap seconds?
  • Java color detection
  • how to avoid repetitive constructor in children
  • Unable to decode certificate at client new X509Certificate2()
  • zope_i18n_compile_mo_files doesn't work on a Zeo configuration
  • OOP Javascript - Is “get property” method necessary?
  • Needing to do .toArray() to get output of mongodb .find() on key name not value
  • With Hadoop, can I create a tasktracker on a machine that isn't running a datanode?
  • Meteor: Do Something On Email Verification Confirmation
  • Spark fat jar to run multiple versions on YARN
  • Cannot resolve symbol 'MyApi'
  • Scrapy recursive link crawler
  • How to get address from latitude and longitude android google map v2 [duplicate]
  • Symfony2: How to get request parameter
  • ActionScript 2 vs ActionScript 3 performance
  • ORA-29908: missing primary invocation for ancillary operator
  • json Serialization in asp
  • How can I get HTML syntax highlighting in my editor for CakePHP?
  • Free memory of cv::Mat loaded using FileStorage API
  • using HTMLImports.whenReady not working in chrome
  • How do I configure my settings file to work with unit tests?
  • IndexOutOfRangeException on multidimensional array despite using GetLength check
  • Binding checkboxes to object values in AngularJs