public class AvroParquetInputFormat extends ParquetInputFormat<org.apache.avro.generic.IndexedRecord>
InputFormat for Parquet files.READ_SUPPORT_CLASS, UNBOUND_RECORD_FILTER| Constructor and Description |
|---|
AvroParquetInputFormat() |
| Modifier and Type | Method and Description |
|---|---|
static void |
setAvroReadSchema(org.apache.hadoop.mapreduce.Job job,
org.apache.avro.Schema avroReadSchema)
Override the Avro schema to use for reading.
|
static void |
setRequestedProjection(org.apache.hadoop.mapreduce.Job job,
org.apache.avro.Schema requestedProjection)
Set the subset of columns to read (projection pushdown).
|
createRecordReader, getFooters, getFooters, getGlobalMetaData, getReadSupport, getReadSupportClass, getSplits, getSplits, getUnboundRecordFilter, listStatus, setReadSupportClass, setReadSupportClass, setUnboundRecordFilteraddInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, isSplitable, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSizepublic static void setRequestedProjection(org.apache.hadoop.mapreduce.Job job,
org.apache.avro.Schema requestedProjection)
This is useful if the full schema is large and you only want to read a few columns, since it saves time by not reading unused columns.
If a requested projection is set, then the Avro schema used for reading
must be compatible with the projection. For instance, if a column is not included
in the projection then it must either not be included or be optional in the read
schema. Use setAvroReadSchema(org.apache.hadoop.mapreduce.Job,
org.apache.avro.Schema) to set a read schema, if needed.
job - requestedProjection - setAvroReadSchema(org.apache.hadoop.mapreduce.Job, org.apache.avro.Schema),
AvroParquetOutputFormat.setSchema(org.apache.hadoop.mapreduce.Job, org.apache.avro.Schema)public static void setAvroReadSchema(org.apache.hadoop.mapreduce.Job job,
org.apache.avro.Schema avroReadSchema)
Differences between the read and write schemas are resolved using Avro's schema resolution rules.
job - avroReadSchema - setRequestedProjection(org.apache.hadoop.mapreduce.Job, org.apache.avro.Schema),
AvroParquetOutputFormat.setSchema(org.apache.hadoop.mapreduce.Job, org.apache.avro.Schema)Copyright © 2015. All rights reserved.