Image Data Source

Image Data Source

Parquet, CSV, JSON, JDBC and images (JPG and PNG)
This image data source is used to load image files from a directory, it can load compressed image (jpeg, png, etc.) into raw image representation via ImageIO in Java library. The loaded DataFrame has one StructType column: β€œimage”, containing image data stored as image schema.

The schema of the image column is:

1
origin:
2
StringType (represents the file path of the image)
3
height:
4
IntegerType (height of the image)
5
width:
6
IntegerType (width of the image)
7
nChannels:
8
IntegerType (number of image channels)
9
mode:
10
IntegerType (OpenCV-compatible type)
11
data:
12
BinaryType (Image bytes in OpenCV-compatible order: row-wise BGR in most cases)
Copied!
1
val df = spark.read.format("image").option("dropInvalid", true).load("/home/dv6/spark/spark/data/mllib/images/origin/kittens")
2
df.select("image.origin", "image.width", "image.height").show(truncate=false)
3
4
/*
5
6
Output:
7
8
+-------------------------------------------------------------------------------------+-----+------+
9
|origin |width|height|
10
+-------------------------------------------------------------------------------------+-----+------+
11
|file:///home/dv6/spark/spark/data/mllib/images/origin/kittens/54893.jpg |300 |311 |
12
|file:///home/dv6/spark/spark/data/mllib/images/origin/kittens/DP802813.jpg |199 |313 |
13
|file:///home/dv6/spark/spark/data/mllib/images/origin/kittens/29.5.a_b_EGDP022204.jpg|300 |200 |
14
|file:///home/dv6/spark/spark/data/mllib/images/origin/kittens/DP153539.jpg |300 |296 |
15
+-------------------------------------------------------------------------------------+-----+------+
16
17
*/
Copied!
Last modified 1yr ago