public class WebLogAnalysis extends Object
SELECT
r.pageURL,
r.pageRank,
r.avgDuration
FROM documents d JOIN rankings r
ON d.url = r.url
WHERE CONTAINS(d.text, [keywords])
AND r.rank > [rank]
AND NOT EXISTS
(
SELECT * FROM Visits v
WHERE v.destUrl = d.url
AND v.visitDate < [date]
);
Input files are plain text CSV files using the pipe character ('|') as field separator.
The tables referenced in the query can be generated using the WebLogDataGenerator
and
have the following schemas
CREATE TABLE Documents (
url VARCHAR(100) PRIMARY KEY,
contents TEXT );
CREATE TABLE Rankings (
pageRank INT,
pageURL VARCHAR(100) PRIMARY KEY,
avgDuration INT );
CREATE TABLE Visits (
sourceIP VARCHAR(16),
destURL VARCHAR(100),
visitDate DATE,
adRevenue FLOAT,
userAgent VARCHAR(64),
countryCode VARCHAR(3),
languageCode VARCHAR(6),
searchWord VARCHAR(32),
duration INT );
Usage: WebLogAnalysis --documents <path> --ranks <path> --visits <path> --result <path>
If no parameters are provided, the program is run with default data from WebLogData
.
This example shows how to use:
Modifier and Type | Class and Description |
---|---|
static class |
WebLogAnalysis.AntiJoinVisits
CoGroupFunction that realizes an anti-join.
|
static class |
WebLogAnalysis.FilterByRank
MapFunction that filters for records where the rank exceeds a certain threshold.
|
static class |
WebLogAnalysis.FilterDocByKeyWords
MapFunction that filters for documents that contain a certain set of
keywords.
|
static class |
WebLogAnalysis.FilterVisitsByDate
MapFunction that filters for records of the visits relation where the year
(from the date string) is equal to a certain value.
|
Constructor and Description |
---|
WebLogAnalysis() |
Copyright © 2014–2020 The Apache Software Foundation. All rights reserved.