public class WebLogAnalysis extends Object
SELECT
r.pageURL,
r.pageRank,
r.avgDuration
FROM documents d JOIN rankings r
ON d.url = r.url
WHERE CONTAINS(d.text, [keywords])
AND r.rank > [rank]
AND NOT EXISTS
(
SELECT * FROM Visits v
WHERE v.destUrl = d.url
AND v.visitDate < [date]
);
Input files are plain text CSV files using the pipe character ('|') as field separator. The tables referenced in the query can be generated using the [org.apache.flink.examples.java.relational.util.WebLogDataGenerator} and have the following schemas
CREATE TABLE Documents (
url VARCHAR(100) PRIMARY KEY,
contents TEXT );
CREATE TABLE Rankings (
pageRank INT,
pageURL VARCHAR(100) PRIMARY KEY,
avgDuration INT );
CREATE TABLE Visits (
sourceIP VARCHAR(16),
destURL VARCHAR(100),
visitDate DATE,
adRevenue FLOAT,
userAgent VARCHAR(64),
countryCode VARCHAR(3),
languageCode VARCHAR(6),
searchWord VARCHAR(32),
duration INT );
Usage
WebLogAnalysis --documents <path> --ranks <path> --visits <path> --output <path>
If no parameters are provided, the program is run with default data from
WebLogData
.
This example shows how to use:
- tuple data types - projection and join projection - the CoGroup transformation for an anti-join
Constructor and Description |
---|
WebLogAnalysis() |
public static void main(String[] args)
Copyright © 2014–2017 The Apache Software Foundation. All rights reserved.