Media Dataset
This page describes the fields that can be found in the media csv file. This data is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Field Details
- irn - a unique numeric identifier used by the PrideNZ content management system
- media_reference - a six-digit number referring to a particular event. Note sometimes there are multiple recordings related to a single media_reference number, for example individual interviews conducted at the same physical event. Also note that only media items that have been recorded or processed by PrideNZ will have a reference number. Media from external websites will not have a number
- media_source - the source of the media item. This may (or may not) reflect the actual rights holder, but it is a good starting point for identifying rights information
- url - url link of the related PrideNZ web page
- production_day - day of production. Note sometimes an exact production date is not known. Approximate dates are noted in the description field
- production_month - month of production. Note sometimes an exact production date is not known. Approximate dates are noted in the description field
- production_year - year of production. Note sometimes an exact production date is not known. Approximate dates are noted in the description field
- recording_type - broad categorisation of this media item (event, presentation, performance etc.)
- series - over arching series (if any) relating to this media item
- sub_series - sub series (if any) relating to this media item
- title - title of this media item
- description - description of this media item
- summary_computer_generated - 300-400 word summary created by Generative AI from the audio transcription of this media item
- interviewer - interviewer (if any) relating to this media item
- voices - voices identified in this media item. Voices are delimited by a semi-colon
- tags - tags that have been manually associated with this media item. Tags are delimited by a semi-colon
- tags_computer_generated - tags that have been automatically associated with this media item. Tags are delimited by a semi-colon
- location_name - location name associated with this media item, e.g. National Library
- location - location associated with this media item, e.g. 70 Molesworth Street, Thorndon
- broader_location - broader location associated with this media item, e.g. Wellington
- location_lat - decimal latitude coordinate for mapping purposes
- location_long - decimal longitude coordinate for mapping purposes
- precise_locality - true/false. Is the location a precise locality or more broader
- media_type - describes the type of media item, e.g. audio/video/image
- atl_ref - Alexander Turnbull Library reference (if item is deposited with ATL)
- atl_url - url of Alexander Turnbull Library record (if item is deposited with ATL)
- media_url - url link to the actual media file.
- media_filesize - size of the media file, e.g 34.2MB. This information may not always be present
- media_duration - length of the media file, e.g. 1:23:01 or 5:21 (h:mm:ss). This information may not always be present
- media_hires_url - url link to the actual high-resolution media file.
- media_hires_filesize - size of the media file, e.g 34.2MB. This information may not always be present
- media_hires_duration - length of the media file, e.g. 1:23:01 or 5:21 (h:mm:ss). This information may not always be present
- thumb_url - url link to a generic PrideNZ thumbnail image
- plaintext_url - url link to plain text version of this media item (useful for Gen AI applications)
- metadata_url - url link to an html file that contains metadata relating to this media item
- timestamp - timestamp based on the production date. Use this field to sort by earliest-to-latest production date. Note if the production date is only a partial date (e.g. month and year without a day) then the timestamp defaults to the first day of the month
This data set is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License