Tree @scrub-obsolete/main (Download .tar.gz)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 | # 📑 balboa [![CircleCI](https://circleci.com/gh/DCSO/balboa.svg?style=svg)](https://circleci.com/gh/DCSO/balboa) balboa is the BAsic Little Book Of Answers. It consumes and indexes observations from [passive DNS](https://www.farsightsecurity.com/technical/passive-dns/) collection, providing a [GraphQL](https://graphql.org/) interface to access the aggregated contents of the observations database. We built balboa to handle passive DNS data aggregated from metadata gathered by [Suricata](https://suricata-ids.org). The API should be suitable for integration into existing multi-source observable integration frameworks. It is possible to produce results in a [Common Output Format](https://datatracker.ietf.org/doc/draft-dulaunoy-dnsop-passive-dns-cof/) compatible schema using the GraphQL API. In fact, the GraphQL schema is modelled after the COF fields. The balboa software... - is fast for queries and input/updates - implements persistent, compressed storage of observations - supports tracking and specifically querying multiple sensors - makes use of multi-core systems - can accept input from multiple sources simultaneously - HTTP (POST) - AMQP - GraphQL - Unix socket - accepts various (text-based) input formats - JSON-based - [FEVER](https://github.com/DCSO/fever) - [gopassivedns](https://github.com/Phillipmartin/gopassivedns) - [Packetbeat](https://www.elastic.co/guide/en/beats/packetbeat/master/packetbeat-dns-options.html) (via Logstash) - [Suricata EVE DNS v1 and v2](http://suricata.readthedocs.io/en/latest/output/eve/eve-json-format.html#event-type-dns) - flat file - Edward Fjellskål's [PassiveDNS](https://github.com/gamelinux/passivedns) tabular format (default order `-f SMcsCQTAtn`) ## Building and Installation ```text $ go get github.com/DCSO/balboa ... ``` To build the backends: ```text $ cd $GOPATH/src/github.com/DCSO/balboa/backends $ make ... ``` This will create a binary executable in the `build/` subdirectories of each backends directory. ### Dependencies - Go 1.7 or later - [RocksDB](https://rocksdb.org/) 5.0 or later (shared lib, with LZ4 support) On Debian (testing and stretch-backports), one can satisfy these dependencies with: ```text % apt install golang-go librocksdb-dev ... ``` ## Usage ### Configuring feeders Feeders are used to get observations into the database. They run concurrently and process inputs in the background, making results accessible via the query interface as soon as the resulting upsert transactions have been completed in the database. What feeders are to be created is defined in a YAML configuration file (to be passed via the `-f` parameter to `balboa serve`). Example: ```yaml feeder: - name: AMQP Input type: amqp url: amqp://guest:guest@localhost:5672 exchange: [ tdh.pdns ] input_format: fever_aggregate - name: HTTP Input type: http listen_host: 127.0.0.1 listen_port: 8081 input_format: fever_aggregate - name: Socket Input type: socket path: /tmp/balboa.sock input_format: gopassivedns ``` A balboa instance given this feeder configuration would support the following input options: - JSON in FEVER's aggregate format delivered via AMQP from a temporary queue attached to the exchange `tdh.pdns` on `localhost` port 5762, authenticated with user `guest` and password `guest` - JSON in FEVER's aggregate format parsed from HTTP POST requests on port 8081 on the local system - JSON in gopassivedns's format, fed into the UNIX socket `/tmp/balboa.sock` created by balboa All of these feeders accept input simultaneously, there is no distinction made as to where an observation has come from. It is possible to specify multiple feeders of the same type but with different settings as long as their `name`s are unique. ### Configuring the database backend Multiple database backends are supported to store pDNS observations persistently. Each database backend is provided as a self-contained binary (executable). The frontend connects to exactly one database backend. The backend, however, supports multiple client or frontend connections. ### Running the backend and frontend services, consuming input All interaction with the frontend on the command line takes place via the `balboa` frontend executable. The frontend depends on a backend service. E.g the RocksDB backend can be started using: ```text $ balboa-rocksdb -h `balboa-rocksdb` provides a pdns database backend for `balboa` Usage: balboa-rocksdb [options] -h display help -D daemonize (default: off) -d <path> path to rocksdb database (default: `/tmp/balboa-rocksdb`) -l listen address (default: 127.0.0.1) -p listen port (default: 4242) -v increase verbosity; can be passed multiple times -j thread throttle limit, maximum concurrent connections (default: 64) --membudget <memory-in-bytes> rocksdb membudget option (value: 134217728) --parallelism <number-of-threads> rocksdb parallelism option (value: 8) --max_log_file_size <size> rocksdb log file size option (value: 10485760) --max_open_files <number> rocksdb max number of open files (value: 300) --keep_log_file_num <number> rocksdb max number of log files (value: 2) --database_path <path> same as `-d` --version show version thenp exit $ balboa-rocksdb --database_path /data/pdns -l 127.0.0.1 -p 4242 ``` After starting the backend the `balboa` frontend can be started as follows: ```text $ balboa serve -l '' --host 127.0.0.1:4242 INFO[0000] starting feeder AMQPInput2 INFO[0000] starting feeder HTTP Input INFO[0000] accepting submissions on port 8081 INFO[0000] starting feeder Socket Input INFO[0000] starting feeder Suricata Socket Input INFO[0000] ConsumeFeed() starting INFO[0000] serving GraphQL on port 8080 ... ``` After startup, the feeders are free to be used for data ingest. For example, one might do some of the following to test data consumption (assuming the feeders above are used): - for AMQP: ```text $ scripts/mkjson.py | rabbitmqadmin publish routing_key="" exchange=tdh.pdns ... ``` - for HTTP: ```text $ scripts/mkjson.py | curl -d@- -qs --header "X-Sensor-ID: abcde" http://localhost:8081/submit ... ``` - for socket: ```text $ sudo gopassivedns -dev eth0 | socat /tmp/balboa.sock STDIN ... ``` ### Querying the server The intended main interface for interacting with the server is via GraphQL. For example, the query ```graphql query { entries(rrname: "test.foobar.de", sensor_id: "abcde", limit: 1) { rrname rrtype rdata time_first time_last sensor_id count } } ``` would return something like ```json { "data": { "entries": [ { "rrname": "test.foobar.de", "rrtype": "A", "rdata": "1.2.3.4", "time_first": 1531943211, "time_last": 1531949570, "sensor_id": "abcde", "count": 3 } ] } } ``` This also works with `rdata` as the query parameter, but at least one of `rrname` or `rdata` must be stated. If there is no `sensor_id` parameter, then all results will be returned regardless of where the DNS answer was observed. Use the `time_first_rfc3339` and `time_last_rfc3339` instead of `time_first` and `time_last`, respectively, to get human-readable timestamps. ### Aliases Sometimes it is interesting to ask for all the domain names that resolve to the same IP address. For this reason, the GraphQL API supports a virtual `aliases` field that returns all Entries with RRType `A` or `AAAA` that share the same address in the Rdata field. Example: ```graphql { entries(rrname: "heise.de", rrtype: A) { rrname rdata rrtype time_first_rfc3339 time_last_rfc3339 aliases { rrname } } } ``` ```json { "data": { "entries": [ { "rrname": "heise.de", "rdata": "193.99.144.80", "rrtype": "A", "time_first_rfc3339": "2018-07-10T08:05:45Z", "time_last_rfc3339": "2018-10-18T09:24:38Z", "aliases": [ { "rrname": "ct.de" }, { "rrname": "ix.de" }, { "rrname": "redirector.heise.de" }, { "rrname": "www.ix.de" } ] } ] } } ``` ### Bulk queries There is also a shortcut tool to make 'bulk' querying easier. For example, to get all the information on the hosts in range 1.2.0.0/16 as observed by sensor `abcde`, one can use: ```text $ balboa query --sensor abcde 1.2.0.0/16 {"count":6,"time_first":1531943211,"time_last":1531949570,"rrtype":"A","rrname":"test.foobar.de","rdata":"1.2.3.4","sensor_id":"abcde"} {"count":1,"time_first":1531943215,"time_last":1531949530,"rrtype":"A","rrname":"baz.foobar.de","rdata":"1.2.3.7","sensor_id":"abcde"} ``` Note that this tool currently only does a lot of concurrent individual queries! To improve performance in these cases it might be worthwhile to allow for range queries on the server side as well in the future. ### Other tools Run `balboa` without arguments to list available subcommands and get a short description of what they do. See also `README.md` in the `backends` directory. ## Author/Contact Sascha Steinbiss ## License BSD-3-clause |
Commit History @scrub-obsolete/main
0
»»
- fix build on MIPS systems Sascha Steinbiss 4 years ago
- adjust d/changelog Sascha Steinbiss 4 years ago
- do not ship database.yml anymore Sascha Steinbiss 4 years ago
- finish up new packaging for 2.0.0 Sascha Steinbiss 4 years ago
- add spelling patch Sascha Steinbiss 4 years ago
- Update upstream source from tag 'upstream/2.0.0+ds' Sascha Steinbiss 4 years ago
- New upstream version 2.0.0+ds Sascha Steinbiss 4 years ago
- update packaging for 2.0.0 Sascha Steinbiss 4 years ago
- do not hardcode shared lib dependencies Sascha Steinbiss 5 years ago
- update rocksdb dependency to 5.15 Sascha Steinbiss 5 years ago
- use /run in service file Sascha Steinbiss 5 years ago
- remove unneeded dependency Sascha Steinbiss 5 years ago
- add autopkgtest Sascha Steinbiss 5 years ago
- remove unneeded env vars in d/rules Sascha Steinbiss 5 years ago
- refresh patch, add header Sascha Steinbiss 5 years ago
- finish d/copyright Sascha Steinbiss 5 years ago
- add bug# Sascha Steinbiss 5 years ago
- get d/control ready for official Debian Sascha Steinbiss 5 years ago
- clean deps Sascha Steinbiss 5 years ago
- remove unneeded file limit Sascha Steinbiss 5 years ago
- create required directories Sascha Steinbiss 5 years ago
- initial packaging Sascha Steinbiss 5 years ago
- New upstream version 1.0 Sascha Steinbiss 5 years ago
0
»»